A Quick Guide to Apache Geode

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

Apache Geode is a distributed in-memory data grid supporting caching and data computation.

In this tutorial, we’ll cover Geode’s key concepts and run through some code samples using its Java client.

2. Setup

First, we need to download and install Apache Geode and set the gfsh environment. To do this, we can follow the instructions in Geode’s official guide.

And second, this tutorial will create some filesystem artifacts. So, we can isolate them by creating a temporary directory and launching things from there.

2.1. Installation and Configuration

From our temporary directory, we need to start a Locator instance:

gfsh> start locator --name=locator --bind-address=localhost

Locators are responsible for the coordination between different members of a Geode Cluster, which we can further administer over JMX.

Next, let’s start a Server instance to host one or more data Regions:

gfsh> start server --name=server1 --server-port=0

We set the –server-port option to 0 so that Geode will pick any available port. Though if we leave it out, the server will use the default port 40404. A server is a configurable member of the Cluster that runs as a long-lived process and is responsible for managing data Regions.

And finally, we need a Region:

gfsh> create region --name=baeldung --type=REPLICATE

The Region is ultimately where we will store our data.

2.2. Verification

Let’s make sure that we have everything working before we go any further.

First, let’s check whether we have our Server and our Locator:

gfsh> list members
 Name   | Id
------- | ----------------------------------------------------------
server1 | 192.168.0.105(server1:6119)<v1>:1024
locator | 127.0.0.1(locator:5996:locator)<ec><v0>:1024 [Coordinator]

And next, that we have our Region:

gfsh> describe region --name=baeldung
..........................................................
Name            : baeldung
Data Policy     : replicate
Hosting Members : server1

Non-Default Attributes Shared By Hosting Members  

 Type  |    Name     | Value
------ | ----------- | ---------------
Region | data-policy | REPLICATE
       | size        | 0
       | scope       | distributed-ack

Also, we should have some directories on the file system under our temporary directory called “locator” and “server1”.

With this output, we know that we’re ready to move on.

3. Maven Dependency

Now that we have a running Geode, let’s start looking at the client code.

To work with Geode in our Java code, we’re going to need to add the Apache Geode Java client library to our pom:

<dependency>
     <groupId>org.apache.geode</groupId>
     <artifactId>geode-core</artifactId>
     <version>1.15.1</version>
</dependency>

Let’s begin by simply storing and retrieving some data in a couple of regions.

4. Simple Storage and Retrieval

Let’s demonstrate how to store single values, batches of values as well as custom objects.

To start storing data in our “baeldung” region, let’s connect to it using the locator:

@Before
public void connect() {
    this.cache = new ClientCacheFactory()
      .addPoolLocator("localhost", 10334)
        .create();
    this.region = cache.<String, String> 
      createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
        .create("baeldung");
}

4.1. Saving Single Values

Now, we can simply store and retrieve data in our region:

@Test
public void whenSendMessageToRegion_thenMessageSavedSuccessfully() {

    this.region.put("A", "Hello");
    this.region.put("B", "Baeldung");

    assertEquals("Hello", region.get("A"));
    assertEquals("Baeldung", region.get("B"));
}

4.2. Saving Multiple Values at Once

We can also save multiple values at once, say when trying to reduce network latency:

@Test
public void whenPutMultipleValuesAtOnce_thenValuesSavedSuccessfully() {

    Supplier<Stream<String>> keys = () -> Stream.of("A", "B", "C", "D", "E");
    Map<String, String> values = keys.get()
        .collect(Collectors.toMap(Function.identity(), String::toLowerCase));

    this.region.putAll(values);

    keys.get()
        .forEach(k -> assertEquals(k.toLowerCase(), this.region.get(k)));
}

4.3. Saving Custom Objects

Strings are useful, but sooner rather than later we’ll need to store custom objects.

Let’s imagine that we have a customer record we want to store using the following key type:

public class CustomerKey implements Serializable {
    private long id;
    private String country;
    
    // getters and setters
    // equals and hashcode
}

And the following value type:

public class Customer implements Serializable {
    private CustomerKey key;
    private String firstName;
    private String lastName;
    private Integer age;
    
    // getters and setters 
}

There are a couple of extra steps to be able to store these:

First, they should implement Serializable. While this isn’t a strict requirement, by making them Serializable, Geode can store them more robustly.

Second, they need to be on our application’s classpath as well as the classpath of our Geode Server.

To get them to the server’s classpath, let’s package them up, say using mvn clean package.

And then we can reference the resulting jar in a new start server command:

gfsh> stop server --name=server1
gfsh> start server --name=server1 --classpath=../lib/apache-geode-1.0-SNAPSHOT.jar --server-port=0

Again, we have to run these commands from the temporary directory.

Finally, let’s create a new Region named “baeldung-customers” on the Server using the same command we used for creating the “baeldung” region:

gfsh> create region --name=baeldung-customers --type=REPLICATE

In the code, we’ll reach out to the locator as before, specifying the custom type:

@Before
public void connect() {
    // ... connect through the locator
    this.customerRegion = this.cache.<CustomerKey, Customer> 
      createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
        .create("baeldung-customers");
}

And, then, we can store our customer as before:

@Test
public void whenPutCustomKey_thenValuesSavedSuccessfully() {
    CustomerKey key = new CustomerKey(123);
    Customer customer = new Customer(key, "William", "Russell", 35);

    this.customerRegion.put(key, customer);

    Customer storedCustomer = this.customerRegion.get(key);
    assertEquals("William", storedCustomer.getFirstName());
    assertEquals("Russell", storedCustomer.getLastName());
}

5. Region Types

For most environments, we’ll have more than one copy or more than one partition of our region, depending on our read and write throughput requirements.

So far, we’ve used in-memory replicated regions. Let’s take a closer look.

5.1. Replicated Region

As the name suggests, a Replicated Region maintains copies of its data on more than one Server. Let’s test this.

From the gfsh console in the working directory, let’s add one more Server named server2 to the cluster:

gfsh> start server --name=server2 --classpath=../lib/apache-geode-1.0-SNAPSHOT.jar --server-port=0

Remember that when we made “baeldung”, we used –type=REPLICATE. Because of this, Geode will automatically replicate our data to the new server.

Let’s verify this by stopping server1:

gfsh> stop server --name=server1

And, then let’s execute a quick query on the “baeldung” region.

If the data was replicated successfully, we’ll get results back:

gfsh> query --query='select e.key from /baeldung.entries e'
Result : true
Limit  : 100
Rows   : 5

Result
------
C
B
A 
E
D

So, it looks like the replication succeeded!

Adding a replica to our region improves data availability. And, because more than one server can respond to queries, we’ll get higher read throughput as well.

But, what if they both crash? Since these are in-memory regions, the data will be lost. For this, we can instead use –type=REPLICATE_PERSISTENT which also stores the data on disk while replicating.

5.2. Partitioned Region

With larger datasets, we can better scale the system by configuring Geode to split a region up into separate partitions, or buckets.

Let’s create one partitioned Region named “baeldung-partitioned”:

gfsh> create region --name=baeldung-partitioned --type=PARTITION

Add some data:

gfsh> put --region=baeldung-partitioned --key="1" --value="one"
gfsh> put --region=baeldung-partitioned --key="2" --value="two"
gfsh> put --region=baeldung-partitioned --key="3" --value="three"

And quickly verify:

gfsh> query --query='select e.key, e.value from /baeldung-partitioned.entries e'
Result : true
Limit  : 100
Rows   : 3

key | value
--- | -----
2   | two
1   | one
3   | three

Then, to validate that the data got partitioned, let’s stop server1 again and re-query:

gfsh> stop server --name=server1
gfsh> query --query='select e.key, e.value from /baeldung-partitioned.entries e'
Result : true
Limit  : 100
Rows   : 1

key | value
--- | -----
2   | two

We only got some of the data entries back this time because that server only has one partition of the data, so when server1 dropped, its data was lost.

But what if we need both partitioning and redundancy? Geode also supports a number of other types. The following three are handy:

PARTITION_REDUNDANT partitions and replicates our data across different members of the cluster
PARTITION_PERSISTENT partitions the data like PARTITION, but to disk, and
PARTITION_REDUNDANT_PERSISTENT gives us all three behaviors.

6. Object Query Language

Geode also supports Object Query Language, or OQL, which can be more powerful than a simple key lookup. It’s a bit like SQL.

For this example, let’s use the “baeldung-customer” region we built earlier.

If we add a couple more customers:

Map<CustomerKey, Customer> data = new HashMap<>();
data.put(new CustomerKey(1), new Customer("Gheorge", "Manuc", 36));
data.put(new CustomerKey(2), new Customer("Allan", "McDowell", 43));
this.customerRegion.putAll(data);

Then we can use QueryService to find customers whose first name is “Allan”:

QueryService queryService = this.cache.getQueryService();
String query = 
  "select * from /baeldung-customers c where c.firstName = 'Allan'";
SelectResults<Customer> results =
  (SelectResults<Customer>) queryService.newQuery(query).execute();
assertEquals(1, results.size());

7. Function

One of the more powerful notions of in-memory data grids is the idea of “taking the computations to the data”.

Simply put, since Geode is pure Java, it’s easy for us to not only send data but also logic to perform on that data.

This might remind us of the idea of SQL extensions like PL-SQL or Transact-SQL.

7.1. Defining a Function

To define a unit of work for Geode to do, we implement Geode’s Function interface.

For example, let’s imagine we need to change all the customer’s names to upper case.

Instead of querying the data and having our application do the work, we can just implement Function:

public class UpperCaseNames implements Function<Boolean> {
    @Override
    public void execute(FunctionContext<Boolean> context) {
        RegionFunctionContext regionContext = (RegionFunctionContext) context;
        Region<CustomerKey, Customer> region = regionContext.getDataSet();

        for ( Map.Entry<CustomerKey, Customer> entry : region.entrySet() ) {
            Customer customer = entry.getValue();
            customer.setFirstName(customer.getFirstName().toUpperCase());
        }
        context.getResultSender().lastResult(true);   
    }

    @Override
    public String getId() {
        return getClass().getName();
    }
}

Note that getId must return a unique value, so the class name is typically a good pick.

The FunctionContext contains all our region data, and so we can do a more sophisticated query out of it, or, as we’ve done here, mutate it.

And Function has plenty more power than this, so check out the official manual, especially the getResultSender method.

7.2. Deploying Function

We need to make Geode aware of our function to be able to run it. Like we did with our custom data types, we’ll package the jar.

But this time, we can just use the deploy command:

gfsh> deploy --jar=./lib/apache-geode-1.0-SNAPSHOT.jar

7.3. Executing Function

Now, we can execute the Function from the application using the FunctionService:

@Test
public void whenExecuteUppercaseNames_thenCustomerNamesAreUppercased() {
    Execution execution = FunctionService.onRegion(this.customerRegion);
    execution.execute(UpperCaseNames.class.getName());
    Customer customer = this.customerRegion.get(new CustomerKey(1));
    assertEquals("GHEORGE", customer.getFirstName());
}

8. Conclusion

In this article, we learned the basic concepts of the Apache Geode ecosystem. We looked at simple gets and puts with standard and custom types, replicated and partitioned regions, and oql and function support.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.