Introduction to Apache Calcite

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

In this tutorial, we’ll learn about Apache Calcite. It’s a powerful data management framework that can be used in various scenarios concerning data access. Calcite focuses on retrieving data from any source, not on storing it. Additionally, its query optimization capability enables faster and more efficient data retrieval.

Let’s delve into more detail, starting with the use cases where Apache Calcite is relevant.

2. Apache Calcite Use Cases

Due to its capabilities, Apache Calcite can be leveraged in several use cases:

It takes years to build query engines for new databases. However, Calcite helps get us started immediately with an out-of-the-box extendable SQL parser, validator, and optimizer. Calcite has been used in building databases, such as HerdDB, Apache Druid, MapD, and many more.

Because of Calcite’s capability to integrate with multiple databases, it’s widely used in building data warehouses and business intelligence tools, such as Apache Kyline, Apache Wayang, Alibaba MaxCompute, and many more.

Calcite is an integral component of streaming platforms, such as Apache Kafka, Apache Apex, and Flink, which help build tools that can present and analyze live feeds.

3. Any Data, Anywhere

Apache Calcite provides ready-made adapters to integrate with third-party data sources, including Cassandra, Elasticsearch, MongoDB, to name a few

Let’s explore this in greater detail.

3.1. High-Level Important Classes

Apache Calcite provides a robust framework for retrieving data. This framework is extendable; thus, custom new adapters can also be created. Let’s take a look at the important Java classes:

The Apache Calcite adapters provide classes, such as ElasticsearchSchemaFactory, MongoSchemaFactory, FileSchemaFactory, implementing the interface SchemaFactory. The SchemaFactory helps connect the underlying data sources in a unified manner by creating a virtual Schema defined in a JSON/YAML model file.

3.2. CSV Adapter

Now let’s see an example where we’ll read the data from a CSV file using SQL query. Let’s begin by importing the necessary Maven dependencies required for using the file adapter in the pom.xml file:

<dependency>
    <groupId>org.apache.calcite</groupId>
    <artifactId>calcite-core</artifactId>
    <version>1.36.0</version>
</dependency>
<dependency>
    <groupId>org.apache.calcite</groupId>
    <artifactId>calcite-file</artifactId>
    <version>1.36.0</version>
</dependency>

Next, let’s define the model in the model.json:

{
  "version": "1.0",
  "defaultSchema": "TRADES",
  "schemas": [
    {
      "name": "TRADES",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.file.FileSchemaFactory",
      "operand": {
        "directory": "trades"
      }
    }
  ]
}

The FileSchemaFactory specified in the model.json looks into the trades directory for the CSV files, and creates a virtual TRADES schema. Subsequently, the CSV files under the trades directory are treated like tables.

Before we move on to see the file adapter in action, let’s look at the trade.csv file, which we’ll query using the calcite adapter:

tradeid:int,product:string,qty:int
232312123,"RFTXC",100
232312124,"RFUXC",200
232312125,"RFSXC",1000

The CSV file has three columns: tradeid, product, and qty. Additionally, the column headers also specify the data types. In total, there are three trade records in the CSV file.

Finally, let’s take a look at how to fetch the records using the Calcite adapter:

@Test
void whenCsvSchema_thenQuerySuccess() throws SQLException {
    Properties info = new Properties();
    info.put("model", getPath("model.json"));
    try (Connection connection = DriverManager.getConnection("jdbc:calcite:", info);) {
        Statement statement = connection.createStatement();
        ResultSet resultSet = statement.executeQuery("select * from trades.trade");

        assertEquals(3, resultSet.getMetaData().getColumnCount());

        List<Integer> tradeIds = new ArrayList<>();
        while (resultSet.next()) {
            tradeIds.add(resultSet.getInt("tradeid"));
        }

        assertEquals(3, tradeIds.size());
    }
}

The Calcite adapter takes the model property to create a virtual schema mimicking the file system. Then, using the usual JDBC semantics, it fetches the records from the trade.csv file.

The file adapter can read not only CSV files, but also HTML and JSON files. Moreover, for handling CSV files, Apache Calcite also provides a special CSV adapter for handling advanced use cases that use CSVSchemaFactory.

3.3. In-Memory SQL Operation on Java Objects

Similar to the CSV adapter example, let’s look at another example where, with the help of Apache Calcite, we’ll run SQL queries on Java objects.

Assume two arrays of Employee and Department classes in the CompanySchema class:

public class CompanySchema {
    public Employee[] employees;
    public Department[] departments;
}

Now let’s take a look at the Employee class:

public class Employee {
    public String name;
    public String id;

    public String deptId;

    public Employee(String name, String id, String deptId) {
        this.name = name;
        this.id = id;
        this.deptId = deptId;
    }
}

Similar to the Employee class, let’s define the Department class:

public class Department {
    public String deptId;
    public String deptName;

    public Department(String deptId, String deptName) {
        this.deptId = deptId;
        this.deptName = deptName;
    }
}

Let’s assume there are three departments: Finance, Marketing, and Human Resources. We’ll run a query on the CompanySchema object to find the number of employees in each of the departments:

@Test
void whenQueryEmployeesObject_thenSuccess() throws SQLException {
    Properties info = new Properties();
    info.setProperty("lex", "JAVA");
    Connection connection = DriverManager.getConnection("jdbc:calcite:", info);
    CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);
    SchemaPlus rootSchema = calciteConnection.getRootSchema();
    Schema schema = new ReflectiveSchema(companySchema);
    rootSchema.add("company", schema);
    Statement statement = calciteConnection.createStatement();
    String query = "select dept.deptName, count(emp.id) "
      + "from company.employees as emp "
      + "join company.departments as dept "
      + "on (emp.deptId = dept.deptId) "
      + "group by dept.deptName";

    assertDoesNotThrow(() -> {
        ResultSet resultSet = statement.executeQuery(query);
        while (resultSet.next()) {
            logger.info("Dept Name:" + resultSet.getString(1)
              + " No. of employees:" + resultSet.getInt(2));
        }
    });
}

Interestingly, the method runs fine and fetches the results as well. In the method, the Apache Calcite class ReflectiveSchema helps create the Schema of the CompanySchema object. Then it runs the SQL query and fetches the records using the standard JDBC semantics.

This example proves that, irrespective of the source, Calcite can fetch data from anywhere using SQL statements.

4. Query Processing

Query processing is Apache calcite’s core functionality.

Standard JDBC drivers or SQL clients execute queries on databases. In contrast, Apache Calcite, after parsing and validating the query, intelligently optimizes them for efficient execution, saving resources and boosting performance.

4.1. Decoding Query Processing Steps

Calcite offers pretty standard components that help in query processing:

Interestingly, we can also extend these components to meet the specific requirements of any database. Let’s understand more about these steps in detail.

4.2. SQL Parser and Validator

As part of the parsing process, the parser converts the SQL query into a tree-like structure called AST (Abstract Syntax Tree).

Let’s assume a SQL query on two tables, Teacher and Department:

Select Teacher.name, Department.name 
From Teacher join 
Department On (Department.deptid = Teacher.deptid)
Where Department.name = 'Science'

First, the query parser converts the query into an AST, and then performs a basic syntactic validation:

calcite ast

Further, the validator validates the nodes semantically:

validating the functions and operators
validating the database objects, like tables and columns, against the database’s catalog

4.3. Relational Expression Builder

Subsequently, after the validation step, the relational expression builder converts the syntax tree using some of the common relational operators:

LogicalTableScan: Reads data from a table
LogicalFilter: Selects rows based on a condition
LogicalProject: Chooses specific columns to include
LogicalJoin: Combines rows from two tables based on matching values

Considering the AST shown earlier, the corresponding logical relational expression derived from it would be:

LogicalProject(
    projects=[
        $0.name AS name0,
        $1.name AS name1
    ],
    input=LogicalFilter(
        condition=[
            ($1.name = 'Science')
        ],
        input=LogicalJoin(
            condition=[
                ($0.deptid = $1.deptid)
            ],
            left=LogicalTableScan(table=[[Teacher]]),
            right=LogicalTableScan(table=[[Department]])
        )
    )
)

In the relational expression, $0 and $1 represent the tables Teacher and Department. Essentially, it’s a mathematical expression that helps understand what operations will be carried out to get the results. However, it has no information related to execution.

4.4. Query Optimizer

Then the Calcite optimizer applies optimization on the relational expression. Some common optimizations include:

Predicate Pushdown: Pushing filters as close to the data source as possible to reduce the amount of data fetched
Join Reordering: Rearranging join order to minimize intermediate results and improve efficiency
Projection Pushdown: Pushing down projections to avoid unnecessary columns from being processed
Index Usage: Identifying and utilizing indexes to speed up data retrieval

4.5. Query Planner, Generator, and Executor

Following optimization, the Calcite Query Planner creates an execution plan for executing the optimized query. The execution plan specifies the exact steps to be taken by the query engine to fetch and process data. This is also called a physical plan specific to the back-end query engine.

Then the Calcite Query Generator generates code in a language specific to the chosen execution engine.

Finally, the Executor connects to the database to execute the final query.

5. Conclusion

In this article, we explored the capabilities of Apache Calcite, which rapidly equips databases with standardized SQL parsers, validators, and optimizers. This frees vendors from developing years-long query engines, and empowers them to prioritize back-end storage. Additionally, Calcite’s ready-made adapters simplify connection to diverse databases, helping to develop a unified integration interface.

Moreover, by leveraging Calcite, database developers can accelerate time-to-market, and deliver robust, versatile SQL capabilities.

The code used in this article is available over on GitHub.

Introduction to Apache Calcite

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Apache Calcite Use Cases

3. Any Data, Anywhere

3.1. High-Level Important Classes

3.2. CSV Adapter

3.3. In-Memory SQL Operation on Java Objects

4. Query Processing

4.1. Decoding Query Processing Steps

4.2. SQL Parser and Validator

4.3. Relational Expression Builder

4.4. Query Optimizer

4.5. Query Planner, Generator, and Executor

5. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Apache Calcite Use Cases

3. Any Data, Anywhere

3.1. High-Level Important Classes

3.2. CSV Adapter

3.3. In-Memory SQL Operation on Java Objects

4. Query Processing

4.1. Decoding Query Processing Steps

4.2. SQL Parser and Validator

4.3. Relational Expression Builder

4.4. Query Optimizer

4.5. Query Planner, Generator, and Executor

5. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: