Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: October 2, 2025
In this article, we’ll talk about CrateDB. CrateDB is an open-source, distributed SQL database designed for storing and running high-speed analytics on large volumes of data.
CrateDB combines the benefits of a SQL database and the scalability and flexibility that NoSQL databases provide.
In the following sections, we’ll install and run CrateDB. We’ll also delve into the features that make CrateDB a viable and lucrative database option.
We can install and run CrateDB locally. It supports different operating systems, including Linux-based distributions, Mac OS, Windows, and even Docker.
Additionally, there is a cloud-based offering called CrateDB Cloud, a fully managed service that does not require any installation.
Let’s install CrateDB on Debian or Ubuntu machines:
# Install prerequisites.
sudo apt update
sudo apt install --yes apt-transport-https apt-utils curl gnupg lsb-release
# Import the public GPG key for verifying the package signatures.
curl -sS https://cdn.crate.io/downloads/debian/DEB-GPG-KEY-crate | \
sudo tee /etc/apt/trusted.gpg.d/cratedb.asc
# Add CrateDB repository to Apt
echo "deb https://cdn.crate.io/downloads/debian/stable/ default main" | \
sudo tee /etc/apt/sources.list.d/crate-stable.list
## Install
sudo apt install crate
Once the installation is complete, we need to restart the service:
sudo systemctl start crate
For Windows machines, we need to download the release archive. Then we extract the archive using a tool like 7-Zip. Now, let’s open PowerShell and run:
unzip -o crate-*.zip
cd crate-*
./bin/crate
When the installation is complete, the web Admin UI of CrateDB will be accessible at http://localhost:4200.
Now that CrateDB is installed, let’s get started with using it. CrateDB has two main components, the Admin UI and the CrateDB Shell CLI(Crash).
The Admin UI is a great interactive way to use CrateDB and comes with several useful tabs and a status bar.
The Status bar shows information essential to the running CrateDB node, including the Cluster name, the version of the CrateDB that is running, and the total number of nodes in the cluster.
This also contains additional health checks for Data and Cluster status.
On the left panel, there are several tabs of the following:
Let’s check how the Admin UI looks with zero configurations done:
The Crate DB shell is an interactive CLI application that works with CrateDB, and we can use it like a SQL terminal and node manager.
Crash needs to connect to the CrateDB cluster to work:
crash --hosts node1.example.com \
node2.example.com \
-c "SELECT * FROM sys.nodes" \
--format json \
> output.json
Crash supports several built-in commands, and a comprehensive list of such is present in the documentation page.
In this section, let’s talk about how CrateDB handles data internally.
CrateDB is columnar. It therefore supports dynamic schemas and data structures that change on the fly without downtimes. This allows for storing and querying structured, semi-structured data, and unstructured data, such as raw JSONs, very easily.
While traditional object-relational databases allow us to store and process JSON data only opaquely, CrateDB handles objects as first-level citizens.
With such dynamic objects, when no strict schema is defined, CrateDB indexes all attributes by default, therefore allowing for efficient querying.
CrateDB uses Apache Lucene as its underlying storage layer. Therefore, it inherits its concepts about storage entities and units similar to Elasticsearch.
Let’s consider the following example. We create a table for storing weather information with the following:
CREATE TABLE weather_reading ( "timestamp" TIMESTAMP, "readings" OBJECT(DYNAMIC), "fields" OBJECT(DYNAMIC) );
Let’s perform a DML query on the table:
INSERT INTO weather_reading ( timestamp, readings, fields ) VALUES ( '2025-07-16T00:00:00', { "site_id" = 007 }, { "temperature" = 21.92, "humidity" = 100 } );
Now, we can leverage the power of CrateDB to efficiently query the dynamic data:
SELECT
*
FROM
reading
WHERE
readings['site_id'] =
'007';
There is native support for standard SQL baked into CrateDB. This includes joins, subqueries, indexing, and aggregations. This makes it very easy to integrate with existing SQL-based clients. CrateDB uses the PostgreSQL wire protocol and is therefore compatible with many PostgreSQL tools and clients.
An automatic advantage that comes out of this is that integration with systems using traditional database access interfaces like ODBC or JDBC is much simpler.
CrateDB offers native BM25 term search and vector search, all while using SQL.
This enables powerful single-query hybrid search over massive volumes of data.
Additionally, there is full-text search functionality with an inverted index and Okapi BM25 search ranking at scale.
CrateDB also natively supports Geo-search and semantic search on Vector embeddings through HNSQ data retrieval.
CrateDB is natively distributed. It’s well-suited for large-scale distributed systems and employs a shared-nothing architecture. Therefore, data is automatically sharded across all available nodes, which makes it highly available and fault-tolerant.
With support for horizontal scaling out of the box, new nodes can be easily added and removed.
Additionally, a distributed query execution engine powers CrateDB that supports running ad-hoc SQL queries on billions of records in milliseconds. This makes CrateDB ideal for real-time aggregations and operational analytics.
CrateDB provides scalability through partitioning, sharding, and replication.
CrateDB automatically splits tables into shards, which are self-contained chunks that hold a portion of the data. These shards are then distributed across nodes in the cluster, each backed by a dedicated Lucene index.
It should be noted that choosing the right number of shards depends on our data volume, query patterns, and hardware setup.
We can further organise data using partitioning, which splits tables based on defined columns. Each partition is essentially a collection of shards, making queries faster by narrowing the data scope.
To ensure fault tolerance and improve performance, CrateDB lets us replicate table shards across nodes. Replication not only provides redundancy but also boosts read performance by enabling more parallel query execution.
In this article, we discussed CrateDB and its powerful features and dynamic data handling capabilities, which make it a great alternative to traditional databases. It combines the benefits of SQL and NoSQL databases. We also discussed the distributed nature of CrateDB, making it suitable for efficiently handling large volumes of data with ease.
Finally, we saw the advanced search capabilities across multiple domains that come out of the box with CrateDB. CrateDB also provides excellent performance and scaling benchmarks.