Introduction to CrateDB Baeldung on SQL

1. Introduction

In this article, we’ll talk about CrateDB. CrateDB is an open-source, distributed SQL database designed for storing and running high-speed analytics on large volumes of data.

CrateDB combines the benefits of a SQL database and the scalability and flexibility that NoSQL databases provide.

In the following sections, we’ll install and run CrateDB. We’ll also delve into the features that make CrateDB a viable and lucrative database option.

2. Installing CrateDB

We can install and run CrateDB locally. It supports different operating systems, including Linux-based distributions, Mac OS, Windows, and even Docker.

Additionally, there is a cloud-based offering called CrateDB Cloud, a fully managed service that does not require any installation.

2.1. Installation in Linux

Let’s install CrateDB on Debian or Ubuntu machines:

# Install prerequisites.
sudo apt update
sudo apt install --yes apt-transport-https apt-utils curl gnupg lsb-release

# Import the public GPG key for verifying the package signatures.
curl -sS https://cdn.crate.io/downloads/debian/DEB-GPG-KEY-crate | \
    sudo tee /etc/apt/trusted.gpg.d/cratedb.asc

# Add CrateDB repository to Apt
echo "deb https://cdn.crate.io/downloads/debian/stable/ default main" | \
    sudo tee /etc/apt/sources.list.d/crate-stable.list


## Install
sudo apt install crate

Once the installation is complete, we need to restart the service:

sudo systemctl start crate

2.2. Installation in Windows

For Windows machines, we need to download the release archive. Then we extract the archive using a tool like 7-Zip. Now, let’s open PowerShell and run:

unzip -o crate-*.zip
cd crate-*
./bin/crate

When the installation is complete, the web Admin UI of CrateDB will be accessible at http://localhost:4200.

3. Running CrateDB

Now that CrateDB is installed, let’s get started with using it. CrateDB has two main components, the Admin UI and the CrateDB Shell CLI(Crash).

3.1. Admin UI

The Admin UI is a great interactive way to use CrateDB and comes with several useful tabs and a status bar.

The Status bar shows information essential to the running CrateDB node, including the Cluster name, the version of the CrateDB that is running, and the total number of nodes in the cluster.

This also contains additional health checks for Data and Cluster status.

On the left panel, there are several tabs of the following:

Overview screen
SQL console
Table browser
Views browser
Shards browser
Cluster browser
Monitoring

Let’s check how the Admin UI looks with zero configurations done:

3.2. Crash(Crate DB Shell)

The Crate DB shell is an interactive CLI application that works with CrateDB, and we can use it like a SQL terminal and node manager.

Crash needs to connect to the CrateDB cluster to work:

crash --hosts node1.example.com \
                node2.example.com \
        -c "SELECT * FROM sys.nodes" \
        --format json \
    > output.json

Crash supports several built-in commands, and a comprehensive list of such is present in the documentation page.

4. Dynamic Data Handling and SQL Support

In this section, let’s talk about how CrateDB handles data internally.

4.1. Data in CrateDB

CrateDB is columnar. It therefore supports dynamic schemas and data structures that change on the fly without downtimes. This allows for storing and querying structured, semi-structured data, and unstructured data, such as raw JSONs, very easily.

While traditional object-relational databases allow us to store and process JSON data only opaquely, CrateDB handles objects as first-level citizens.

With such dynamic objects, when no strict schema is defined, CrateDB indexes all attributes by default, therefore allowing for efficient querying.

CrateDB uses Apache Lucene as its underlying storage layer. Therefore, it inherits its concepts about storage entities and units similar to Elasticsearch.

Let’s consider the following example. We create a table for storing weather information with the following:

CREATE TABLE weather_reading ( "timestamp" TIMESTAMP, "readings" OBJECT(DYNAMIC), "fields" OBJECT(DYNAMIC) );

Let’s perform a DML query on the table:

INSERT INTO weather_reading ( timestamp, readings, fields ) VALUES ( '2025-07-16T00:00:00', { "site_id" = 007 }, { "temperature" = 21.92, "humidity" = 100 } );

Now, we can leverage the power of CrateDB to efficiently query the dynamic data:

SELECT
  *
FROM
  reading
WHERE
  readings['site_id'] = 
  '007';

4.2. SQL Support

There is native support for standard SQL baked into CrateDB. This includes joins, subqueries, indexing, and aggregations. This makes it very easy to integrate with existing SQL-based clients. CrateDB uses the PostgreSQL wire protocol and is therefore compatible with many PostgreSQL tools and clients.

An automatic advantage that comes out of this is that integration with systems using traditional database access interfaces like ODBC or JDBC is much simpler.

4.3. Search in CrateDB

CrateDB offers native BM25 term search and vector search, all while using SQL.

This enables powerful single-query hybrid search over massive volumes of data.

Additionally, there is full-text search functionality with an inverted index and Okapi BM25 search ranking at scale.

CrateDB also natively supports Geo-search and semantic search on Vector embeddings through HNSQ data retrieval.

5. Distributed Architecture

CrateDB is natively distributed. It’s well-suited for large-scale distributed systems and employs a shared-nothing architecture. Therefore, data is automatically sharded across all available nodes, which makes it highly available and fault-tolerant.

With support for horizontal scaling out of the box, new nodes can be easily added and removed.

Additionally, a distributed query execution engine powers CrateDB that supports running ad-hoc SQL queries on billions of records in milliseconds. This makes CrateDB ideal for real-time aggregations and operational analytics.

6. Performance and Scalability

CrateDB provides scalability through partitioning, sharding, and replication.

CrateDB automatically splits tables into shards, which are self-contained chunks that hold a portion of the data. These shards are then distributed across nodes in the cluster, each backed by a dedicated Lucene index.

It should be noted that choosing the right number of shards depends on our data volume, query patterns, and hardware setup.

We can further organise data using partitioning, which splits tables based on defined columns. Each partition is essentially a collection of shards, making queries faster by narrowing the data scope.

To ensure fault tolerance and improve performance, CrateDB lets us replicate table shards across nodes. Replication not only provides redundancy but also boosts read performance by enabling more parallel query execution.

7. Conclusion

In this article, we discussed CrateDB and its powerful features and dynamic data handling capabilities, which make it a great alternative to traditional databases. It combines the benefits of SQL and NoSQL databases. We also discussed the distributed nature of CrateDB, making it suitable for efficiently handling large volumes of data with ease.

Finally, we saw the advanced search capabilities across multiple domains that come out of the box with CrateDB. CrateDB also provides excellent performance and scaling benchmarks.

Full Archive

About Baeldung