A Guide to Apache Ignite

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Introduction

Apache Ignite is an open source memory-centric distributed platform. We can use it as a database, a caching system or for the in-memory data processing.

The platform uses memory as a storage layer, therefore has impressive performance rate. Simply put, this is one of the fastest atomic data processing platforms currently in production use.

2. Installation and Setup

As a beginning, check out the getting started page for the initial setup and installation instructions.

The Maven dependencies for the application we are going to build:

<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-core</artifactId>
    <version>${ignite.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-indexing</artifactId>
    <version>${ignite.version}</version>
</dependency>

ignite-core is the only mandatory dependency for the project. As we also want to interact with the SQL, ignite-indexing is also here. ${ignite.version} is the latest version of Apache Ignite.

As the last step, we start the Ignite node:

Ignite node started OK (id=53c77dea)
Topology snapshot [ver=1, servers=1, clients=0, CPUs=4, offheap=1.2GB, heap=1.0GB]
Data Regions Configured:
^-- default [initSize=256.0 MiB, maxSize=1.2 GiB, persistenceEnabled=false]

The console output above shows that we’re ready to go.

3. Memory Architecture

The platform is based on Durable Memory Architecture. This enables to store and process the data both on disk and in memory. It increases the performance by using the RAM resources of the cluster effectively.

The data in memory and on the disk has the same binary representation. This means no additional conversion of the data while moving from one layer to another.

Durable memory architecture splits into fixed-size blocks called pages. Pages are stored outside of Java heap and organized in a RAM. It has a unique identifier: FullPageId.

Pages interact with the memory using the PageMemory abstraction.

It helps to read, write a page, also to allocate a page id. Inside the memory, Ignite associates pages with Memory Buffers.

4. Memory Pages

A Page can have the following states:

Unloaded – no page buffer loaded in memory
Clear – the page buffer is loaded and synchronized with the data on disk
Durty – the page buffer holds a data which is different from the one in disk
Dirty in checkpoint – there is another modification starts before the first one persists to disk. Here a checkpoint starts and PageMemory keeps two memory buffers for each Page.

Durable memory allocates local a memory segment called Data Region. By default, it has a capacity of 20% of the cluster memory. Multiple regions configuration allows keeping the usable data in a memory.

The maximum capacity of the region is a Memory Segment. It’s a physical memory or a continuous byte array.

To avoid memory fragmentations, a single page holds multiple key-value entries. Every new entry will be added to the most optimal page. If the key-value pair size exceeds the maximum capacity of the page, Ignite stores the data in more than one page. The same logic applies to updating the data.

SQL and cache indexes are stored in structures known as B+ Trees. Cache keys are ordered by their key values.

5. Lifecycle

Each Ignite node runs on a single JVM instance. However, it’s possible to configure to have multiple Ignite nodes running in a single JVM process.

Let’s go through the lifecycle event types:

BEFORE_NODE_START – before the Ignite node startup
AFTER_NODE_START – fires just after the Ignite node start
BEFORE_NODE_STOP – before initiating the node stop
AFTER_NODE_STOP – after the Ignite node stops

To start a default Ignite node:

Ignite ignite = Ignition.start();

Or from a configuration file:

Ignite ignite = Ignition.start("config/example-cache.xml");

In case we need more control over the initialization process, there is another way with the help of LifecycleBean interface:

public class CustomLifecycleBean implements LifecycleBean {
 
    @Override
    public void onLifecycleEvent(LifecycleEventType lifecycleEventType) 
      throws IgniteException {
 
        if(lifecycleEventType == LifecycleEventType.AFTER_NODE_START) {
            // ...
        }
    }
}

Here, we can use the lifecycle event types to perform actions before or after the node starts/stops.

For that purpose, we pass the configuration instance with the CustomLifecycleBean to the start method:

IgniteConfiguration configuration = new IgniteConfiguration();
configuration.setLifecycleBeans(new CustomLifecycleBean());
Ignite ignite = Ignition.start(configuration);

6. In-Memory Data Grid

Ignite data grid is a distributed key-value storage, very familiar to partitioned HashMap. It is horizontally scaled. This means more cluster nodes we add, more data is cached or stored in memory.

It can provide significant performance improvement to the 3rd party software, like NoSql, RDMS databases as an additional layer for caching.

6.1. Caching Support

The data access API is based on JCache JSR 107 specification.

As an example, let’s create a cache using a template configuration:

IgniteCache<Employee, Integer> cache = ignite.getOrCreateCache(
  "baeldingCache");

Let’s see what’s happening here for more details. First, Ignite finds the memory region where the cache stored.

Then, the B+ tree index Page will be located based on the key hash code. If the index exists, a data Page of the corresponding key will be located.

When the index is NULL, the platform creates the new data entry by using the given key.

Next, let’s add some Employee objects:

cache.put(1, new Employee(1, "John", true));
cache.put(2, new Employee(2, "Anna", false));
cache.put(3, new Employee(3, "George", true));

Again, the durable memory will look for the memory region where the cache belongs. Based on the cache key, the index page will be located in a B+ tree structure.

When the index page doesn’t exist, a new one is requested and added to the tree.

Next, a data page is assigning to the index page.

To read the employee from the cache, we just use the key value:

Employee employee = cache.get(1);

6.2. Streaming Support

In memory data streaming provides an alternative approach for the disk and file system based data processing applications. The Streaming API splits the high load data flow into multiple stages and routes them for processing.

We can modify our example and stream the data from the file. First, we define a data streamer:

IgniteDataStreamer<Integer, Employee> streamer = ignite
  .dataStreamer(cache.getName());

Next, we can register a stream transformer to mark the received employees as employed:

streamer.receiver(StreamTransformer.from((e, arg) -> {
    Employee employee = e.getValue();
    employee.setEmployed(true);
    e.setValue(employee);
    return employee;
}));

As a final step, we iterate over the employees.txt file lines and convert them into Java objects:

Path path = Paths.get(IgniteStream.class.getResource("employees.txt")
  .toURI());
Gson gson = new Gson();
Files.lines(path)
  .forEach(l -> streamer.addData(
    employee.getId(), 
    gson.fromJson(l, Employee.class)));

With the use of streamer.addData() put the employee objects into the stream.

7. SQL Support

The platform provides memory-centric, fault-tolerant SQL database.

We can connect either with pure SQL API or with JDBC. SQL syntax here is ANSI-99, so all the standard aggregation functions in the queries, DML, DDL language operations are supported.

7.1. JDBC

To get more practical, let’s create a table of employees and add some data to it.

For that purpose, we register a JDBC driver and open a connection as a next step:

Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://127.0.0.1/");

With the help of the standard DDL command, we populate the Employee table:

sql.executeUpdate("CREATE TABLE Employee (" +
  " id LONG PRIMARY KEY, name VARCHAR, isEmployed tinyint(1)) " +
  " WITH \"template=replicated\"");

After the WITH keyword, we can set the cache configuration template. Here we use the REPLICATED. By default, the template mode is PARTITIONED. To specify the number of copies of the data we can also specify BACKUPS parameter here, which is 0 by default.

Then, let’s add up some data by using INSERT DML statement:

PreparedStatement sql = conn.prepareStatement(
  "INSERT INTO Employee (id, name, isEmployed) VALUES (?, ?, ?)");

sql.setLong(1, 1);
sql.setString(2, "James");
sql.setBoolean(3, true);
sql.executeUpdate();

// add the rest

Afterward, we select the records:

ResultSet rs 
  = sql.executeQuery("SELECT e.name, e.isEmployed " 
    + " FROM Employee e " 
    + " WHERE e.isEmployed = TRUE ")

7.2. Query the Objects

It’s also possible to perform a query over Java objects stored in the cache. Ignite treats Java object as a separate SQL record:

IgniteCache<Integer, Employee> cache = ignite.cache("baeldungCache");

SqlFieldsQuery sql = new SqlFieldsQuery(
  "select name from Employee where isEmployed = 'true'");

QueryCursor<List<?>> cursor = cache.query(sql);

for (List<?> row : cursor) {
    // do something with the row
}

8. Summary

In this tutorial, we had a quick look at Apache Ignite project. This guide highlights the advantages of the platform over other simial products such as performance gains, durability, lightweight APIs.

As a result, we learned how to use the SQL language and Java API for to store, retrieve, stream the data inside of the persistence or in-memory grid.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.