Guide to MapDB | Baeldung

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Traditional keyword-based search methods rely on exact word matches, often leading to irrelevant results depending on the user's phrasing.

By comparison, using a vector store allows us to represent the data as vector embeddings, based on meaningful relationships. We can then compare the meaning of the user’s query to the stored content, and retrieve more relevant, context-aware results.

Explore how to build an intelligent chatbot using MongoDB Atlas, Langchain4j and Spring Boot:

>> Building an AI Chatbot in Java With Langchain4j and MongoDB Atlas

Accessibility testing is a crucial aspect to ensure that your application is usable for everyone and meets accessibility standards that are required in many countries.

By automating these tests, teams can quickly detect issues related to screen reader compatibility, keyboard navigation, color contrast, and other aspects that could pose a barrier to using the software effectively for people with disabilities.

Learn how to automate accessibility testing with Selenium and the LambdaTest cloud-based testing platform that lets developers and testers perform accessibility automation on over 3000+ real environments:

Automated Accessibility Testing With Selenium

1. Introduction

In this article, we’ll look at the MapDB library — an embedded database engine accessed through a collection-like API.

We start by exploring the core classes DB and DBMaker that help configure, open, and manage our databases. Then, we’ll dive into some examples of MapDB data structures that store and retrieve data.

Finally, we’ll look at some of the in-memory modes before comparing MapDB to traditional databases and Java Collections.

2. Storing Data in MapDB

First, let’s introduce the two classes that we’ll be using constantly throughout this tutorial — DB and DBMaker. The DB class represents an open database. Its methods invoke actions for creating and closing storage collections to handle database records, as well as handling transactional events.

DBMaker handles database configuration, creation, and opening. As part of the configuration, we can choose to host our database either in-memory or on our file system.

2.1. A Simple HashMap Example

To understand how this works, let’s instantiate a new database in memory.

First, let’s create a new in-memory database using the DBMaker class:

DB db = DBMaker.memoryDB().make();

Once our DB object is up and running, we can use it to build an HTreeMap to work with our database records:

String welcomeMessageKey = "Welcome Message";
String welcomeMessageString = "Hello Baeldung!";

HTreeMap myMap = db.hashMap("myMap").createOrOpen();
myMap.put(welcomeMessageKey, welcomeMessageString);

HTreeMap is MapDB’s HashMap implementation. So, now that we have data in our database, we can retrieve it using the get method:

String welcomeMessageFromDB = (String) myMap.get(welcomeMessageKey);
assertEquals(welcomeMessageString, welcomeMessageFromDB);

Finally, now that we’re finished with the database, we should close it to avoid further mutation:

db.close();

To store our data in a file, rather than in memory, all we need to do is change the way that our DB object is instantiated:

DB db = DBMaker.fileDB("file.db").make();

Our example above uses no type parameters. As a result, we’re stuck with casting our results to work with specific types. In our next example, we’ll introduce Serializers to eliminate the need for casting.

2.2. Collections

MapDB includes different collection types. To demonstrate, let’s add and retrieve some data from our database using a NavigableSet, which works as you might expect of a Java Set:

Let’s start with a simple instantiation of our DB object:

DB db = DBMaker.memoryDB().make();

Next, let’s create our NavigableSet:

NavigableSet<String> set = db
  .treeSet("mySet")
  .serializer(Serializer.STRING)
  .createOrOpen();

Here, the serializer ensures that the input data from our database is serialized and deserialized using String objects.

Next, let’s add some data:

set.add("Baeldung");
set.add("is awesome");

Now, let’s check that our two distinct values have been added to the database correctly:

assertEquals(2, set.size());

Finally, since this is a set, let’s add a duplicate string and verify that our database still contains only two values:

set.add("Baeldung");

assertEquals(2, set.size());

2.3. Transactions

Much like traditional databases, the DB class provides methods to commit and rollback the data we add to our database.

To enable this functionality, we need to initialize our DB with the transactionEnable method:

DB db = DBMaker.memoryDB().transactionEnable().make();

Next, let’s create a simple set, add some data, and commit it to the database:

NavigableSet<String> set = db
  .treeSet("mySet")
  .serializer(Serializer.STRING)
  .createOrOpen();

set.add("One");
set.add("Two");

db.commit();

assertEquals(2, set.size());

Now, let’s add a third, uncommitted string to our database:

set.add("Three");

assertEquals(3, set.size());

If we’re not happy with our data, we can rollback the data using DB’s rollback method:

db.rollback();

assertEquals(2, set.size());

2.4. Serializers

MapDB offers a large variety of serializers, which handle the data within the collection. The most important construction parameter is the name, which identifies the individual collection within the DB object:

HTreeMap<String, Long> map = db.hashMap("indentification_name")
  .keySerializer(Serializer.STRING)
  .valueSerializer(Serializer.LONG)
  .create();

While serialization is recommended, it is optional and can be skipped. However, it’s worth noting that this will lead to a slower generic serialization process.

3. HTreeMap

MapDB’s HTreeMap provides HashMap and HashSet collections for working with our database. HTreeMap is a segmented hash tree and does not use a fixed-size hash table. Instead, it uses an auto-expanding index tree and does not rehash all of its data as the table grows. To top it off, HTreeMap is thread-safe and supports parallel writes using multiple segments.

To begin, let’s instantiate a simple HashMap that uses String for both keys and values:

DB db = DBMaker.memoryDB().make();

HTreeMap<String, String> hTreeMap = db
  .hashMap("myTreeMap")
  .keySerializer(Serializer.STRING)
  .valueSerializer(Serializer.STRING)
  .create();

Above, we’ve defined separate serializers for the key and the value. Now that our HashMap is created, let’s add data using the put method:

hTreeMap.put("key1", "value1");
hTreeMap.put("key2", "value2");

assertEquals(2, hTreeMap.size());

As HashMap works on an Object’s hashCode method, adding data using the same key causes the value to be overwritten:

hTreeMap.put("key1", "value3");

assertEquals(2, hTreeMap.size());
assertEquals("value3", hTreeMap.get("key1"));

4. SortedTableMap

MapDB’s SortedTableMap stores keys in a fixed-size table and uses binary search for retrieval. It’s worth noting that once prepared, the map is read-only.

Let’s walk through the process of creating and querying a SortedTableMap. We’ll start by creating a memory-mapped volume to hold the data, as well as a sink to add data. On the first invocation of our volume, we’ll set the read-only flag to false, ensuring we can write to the volume:

String VOLUME_LOCATION = "sortedTableMapVol.db";

Volume vol = MappedFileVol.FACTORY.makeVolume(VOLUME_LOCATION, false);

SortedTableMap.Sink<Integer, String> sink =
  SortedTableMap.create(
    vol,
    Serializer.INTEGER,
    Serializer.STRING)
    .createFromSink();

Next, we’ll add our data and call the create method on the sink to create our map:

for(int i = 0; i < 100; i++){
  sink.put(i, "Value " + Integer.toString(i));
}

sink.create();

Now that our map exists, we can define a read-only volume and open our map using SortedTableMap’s open method:

Volume openVol = MappedFileVol.FACTORY.makeVolume(VOLUME_LOCATION, true);

SortedTableMap<Integer, String> sortedTableMap = SortedTableMap
  .open(
    openVol,
    Serializer.INTEGER,
    Serializer.STRING);

assertEquals(100, sortedTableMap.size());

4.1. Binary Search

Before we move on, let’s understand how the SortedTableMap utilizes binary search in more detail.

SortedTableMap splits the storage into pages, with each page containing several nodes comprised of keys and values. Within these nodes are the key-value pairs that we define in our Java code.

SortedTableMap performs three binary searches to retrieve the correct value:

Keys for each page are stored on-heap in an array. The SortedTableMap performs a binary search to find the correct page.
Next, decompression occurs for each key in the node. A binary search establishes the correct node, according to the keys.
Finally, the SortedTableMap searches over the keys within the node to find the correct value.

5. In-Memory Mode

MapDB offers three types of in-memory store. Let’s take a quick look at each mode, understand how it works, and study its benefits.

5.1. On-Heap

The on-heap mode stores objects in a simple Java Collection Map. It does not employ serialization and can be very fast for small datasets.

However, since the data is stored on-heap, the dataset is managed by garbage collection (GC). The duration of GC rises with the size of the dataset, resulting in performance drops.

Let’s see an example specifying the on-heap mode:

DB db = DBMaker.heapDB().make();

5.2. Byte[]

The second store type is based on byte arrays. In this mode, data is serialized and stored into arrays up to 1MB in size. While technically on-heap, this method is more efficient for garbage collection.

This is recommended by default, and was used in our ‘Hello Baeldung’ example:

DB db = DBMaker.memoryDB().make();

5.3. DirectByteBuffer

The final store is based on DirectByteBuffer. Direct memory, introduced in Java 1.4, allows the passing of data directly to native memory rather than Java heap. As a result, the data will be stored completely off-heap.

We can invoke a store of this type with:

DB db = DBMaker.memoryDirectDB().make();

6. Why MapDB?

So, why use MapDB?

6.1. MapDB vs Traditional Database

MapDB offers a large array of database functionality configured with just a few lines of Java code. When we employ MapDB, we can avoid the often time-consuming setup of various services and connections needed to get our program to work.

Beyond this, MapDB allows us to access the complexity of a database with the familiarity of a Java Collection. With MapDB, we do not need SQL, and we can access records with simple get method calls.

6.2. MapDB vs Simple Java Collections

Java Collections will not persist the data of our application once it stops executing. MapDB offers a simple, flexible, pluggable service that allows us to quickly and easily persist the data in our application while maintaining the utility of Java collection types.

7. Conclusion

In this article, we’ve taken a deep dive into MapDB’s embedded database engine and collection framework.

We started by looking at the core classes DB and DBMaker to configure, open and manage our database. Then, we walked through some examples of data structures that MapDB offers to work with our records. Finally, we looked at the advantages of MapDB over a traditional database or Java Collection.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.