Introduction to DuckDB

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

>> Take a look at DBSchema

1. Overview

In this tutorial, we’ll learn about an analytical relational database called DuckDB. We’ll explore its advantages and see how it’s an efficient solution for analytical tasks. After that, we’ll walk through the installation and some fundamental operations.

2. What Is DuckDB?

DuckDB is an in-memory analytical relational database primarily designed for data analysis. It’s considered an analytical database due to its columnar storage nature, which stores data for each column separately. In contrast, a traditional relational database utilizes a row-based storage that stores data row by row.

Advantages of DuckDB include:

Fast queries – DuckDB utilizes a columnar-vectorized query execution engine that optimizes the data query in large batches.
SQL Compatibility – DuckDB supports standard SQL queries such as aggregation and window functions, which are well-suited for data analysts familiar with SQL.
Quick Deployment – DuckDB has minimal external dependencies and operates within our application process without a separate database instance, making deployment and integration straightforward.
Free – DuckDB is an open-source project which is freely available to all. Its entire source code is accessible on GitHub for exploration and contribution.

3. DuckDB Installation

DuckDB provides various installation options to suit our environment. We’ll demonstrate two common installation methods here.

3.1. Command Line

For Windows users, we can install DuckDB using the WinGet package manager. All we need is to open a command prompt with administrator privileges and execute the following:

winget install DuckDB.cli

On the Mac OS, we can install it using Homebrew:

brew install duckdb

After completing the installation of the DuckDB CLI, brew adds the binary path to the existing environment variable by itself. We can open a new shell session and run the DuckDB CLI by running the following command:

duckdb

3.2. Java

DuckDB can integrate with Java without installing a separate database instance. To get started, we include the following DuckDB JDBC dependency in our pom.xml:

<dependency>
    <groupId>org.duckdb</groupId>
    <artifactId>duckdb_jdbc</artifactId>
    <version>0.10.0</version>
</dependency>

We can load the DuckDB JDBC driver and then create a JDBC connection via the following JDBC URL:

Class.forName("org.duckdb.DuckDBDriver");
Connection conn = DriverManager.getConnection("jdbc:duckdb:");

DuckDB automatically creates an in-memory database instance by default when we connect to it. However, all data persisted in the instance is lost once the DuckDB process finishes. To save our data into the disk, we can append a database name after the colon in the connection URL:

Connection conn = DriverManager.getConnection("jdbc:duckdb:/test_duckdb");

In this example, DuckDB creates a database file called test_duckdb in the root directory. As this is a JDBC library, we may query data by creating an SQL Statement and executing it to obtain the ResultSet. The following is a simple JDBC example to obtain the current date:

Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT current_date");
Date currentDate = rs.next() ? rs.getDate(1) : null;

In the subsequent sections of this tutorial, we can execute SQL statements using the same JDBC approach in Java.

4. Data Import

Let’s proceed with importing some data into DuckDB. It can handle a variety of data formats that simplify importing from external data sources.

4.1. CSV Files

CSV is a common data format for storing tabular data. Let’s consider we have the following CSV file containing customer data:

CustomerId,FirstName,LastName,Gender
101,John,Smith,Male
102,Sarah,Jones,Female
...

We could use the SQL function read_csv to import data from a CSV file to the DuckDB table customer:

CREATE TABLE customer AS 
SELECT * FROM read_csv('customer.csv')

DuckDB can figure out the schema from the header row of the CSV file. The header names are considered table column names, while the subsequent rows are treated as data rows.

4.2. JSON Files

Another popular way to store and share data is JSON. For instance, let’s take the following product.json JSON file:

[
  {
    "productId": 1,
    "productName":"EZ Curl Bar",
    "category": "Sports Equipment"
  },
  {
    "productId": 2,
    "productName": "7' Barbell",
    "category": "Sports Equipment"
  }
]

Similar to CSV import, we can execute an SQL statement to import data to the DuckDB table product:

CREATE TABLE product AS 
SELECT * FROM read_json('product.json')

Just like the CSV, DuckDB automatically figures out the schema from the JSON file based on the JSON attribute names.

4.3. INSERT Statements

We may add data to the DuckDB table using insert statements, as it’s a SQL relational database system. The example below illustrates creating a purchase table that defines the relationship between customer and product and populating a few rows of data:

CREATE TABLE purchase(customerId BIGINT, productId BIGINT);
INSERT INTO purchase(customerId, productId) VALUES (101,1);
INSERT INTO purchase(customerId, productId) VALUES (102,1);
INSERT INTO purchase(customerId, productId) VALUES (102,2);

5. Data Query

With data loaded, we’ll now explore querying DuckDB and analysing our data.

5.1. Join Operations

In addition to importing external data to DuckDB, we can use the external data directly. Based on the examples earlier, we’ll utilize the three data sources from the previous section. Now, let’s join these data sources to gather information on customer products.

SELECT C.firstName, C.lastName, P.productName
FROM read_csv('customer.csv') AS C, read_json('product.json') AS P, purchase S 
WHERE S.customerId = C.customerId
AND S.productId = P.productId

Upon the execution, we’ll see the following query result showing the customer names and their corresponding product purchases:

firstName	lastName	productName
John	Smith	EZ Curl Bar
Sarah	Jones	7′ Barbell
Sarah	Jones	EZ Curl Bar

5.2. Aggregation Functions

DuckDB provides a rich set of aggregation functions to perform calculations on groups of rows. Let’s explore an example with these functions:

SELECT P.productName, COUNT(*) AS purchaseCount
FROM customer C, product P, purchase S
WHERE S.customerId = C.customerId
AND S.productId = P.productId 
GROUP BY P.productName
ORDER BY COUNT(*) DESC

The query counts the number of purchases of each product and order them by the purchase count in descending order:

productName	purchaseCount
EZ Curl Bar	2
7′ Barbell	1

6. Data Export

In data analysis tasks, we often require exporting our aggregated data to other applications for further analysis.

Let’s walk through the process of exporting our data from DuckDB in various formats. In our example, we first create a database view for easy illustration of the exporting later:

CREATE VIEW purchase_view AS
SELECT P.productName, COUNT(*) AS purchaseCount
FROM customer C, product P, purchase S
WHERE S.customerId = C.customerId
AND S.productId = P.productId 
GROUP BY P.productName
ORDER BY COUNT(*) DESC;

6.1. CSV Files

It’s straightforward to export data to a CSV file in DuckDB. We can execute the following simple SQL to copy all data from our database view purchase_view to a CSV file located in the root directory:

COPY purchase_view TO '/output.csv'

6.2. JSON Files

To export data to a JSON file, we need to include an additional option array to specify writing the data as a JSON array. This ensures that our exported JSON file is in an appropriate structure:

COPY (SELECT * FROM purchase_view WHERE purchaseCount > 1) TO '/output.json' (array true);

Instead of exporting all data, we can copy partial results based on the condition of the select query.

7. Conclusion

In this article, we learned about the DuckDB database and its advantages. We also looked at some fundamental operations via illustrative examples.

As always, all the code is available over on GitHub.

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung