Course – Black Friday 2025 – NPI EA (cat= Baeldung)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

Partner – Orkes – NPI EA (cat=Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag=Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Partner – Orkes – NPI EA (cat=Java)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Course – Black Friday 2025 – NPI (cat=Baeldung)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

1. Introduction

Humanity produces massive amounts of sensitive data daily, and organizations need to manage everything from personal information and financial records to classified documents and cybersecurity logs. Statistics show that traditional databases often struggle with both the volume of Big Data and the complex security requirements of modern enterprises.

Since the role of secure data management in the modern world has become more strategic and the majority of organizations require fine-grained access controls down to the cell level, we need a database system capable of handling massive datasets while maintaining strict security protocols – at the scale of petabytes of data with billions of individual access decisions.

In this introductory article, we’ll explore Apache Accumulo, a powerful distributed key-value store with unparalleled cell-level security, high performance, and scalability.

2. What Is Apache Accumulo?

Apache Accumulo, originally developed by the National Security Agency (NSA) based on Google’s Bigtable design, is a distributed key-value store.

Built on top of Apache Hadoop and Apache ZooKeeper, it’s designed to handle massive data volumes across clusters of commodity hardware.

Accumulo enables efficient data ingestion, retrieval, and storage. It also provides server-side programming to allow complex data processing directly within the database, making it a sophisticated solution with fine-grained access control to handle sensitive big data.

The key features of Apache Accumulo are the following:

  • Scalability: can manage petabytes of data across large clusters
  • High Performance: uses in-memory processing and optimizations for efficient data access
  • Cell-Level Security: allows fine-grained access control, where each cell can have a unique visibility label
  • Rich API for Customization: offers features like iterators for in-database processing

Similar to Google’s Bigtable, which is utilized in web indexing, Google Earth, and Google Finance, Apache Accumulo is useful in a variety of applications but is not limited to:

  • Government and military data systems
  • Healthcare record management
  • Financial services data
  • Cybersecurity analytics
  • Large-scale graph processing

3. Installation and Setup

First, let’s make sure that prerequisites like Java 11, Apache Hadoop, YARN, and Apache ZooKeeper are installed along with corresponding JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME set in the path.

Then, we’ll download the latest version of Apache Accumulo and extract it:

$ tar -xzf accumulo-2.1.3-bin.tar.gz

Likewise, we can add ACCUMULO_HOME to the path variable:

$ export ACCUMULO_HOME=/path/to/accumulo
$ export PATH=$ACCUMULO_HOME/bin:$PATH

Next, we start services like ZooKeeper, Hadoop HDFS, and YARN in that order:

$ zkServer start
$ start-dfs.sh
$ start-yarn.sh

Also, we need to make sure that HDFS starts on localhost:8020 and the ZooKeeper host is set to localhost:2181, since these are the default properties set in the accumulo.properties.

Let’s confirm everything is running perfectly using the jps command, which should show the output like:

82306 Main
81385 DataNode
81745 ResourceManager
82867 Jps
81846 NodeManager
81530 SecondaryNameNode
68474 ResourceManager
81276 NameNode

Now, we’re ready to set up Accumulo to store data in ZooKeeper and HDFS:

$ accumulo init

The init command is required only once and prompts for instance name and root password.

Then, we’ll create additional configuration files required to start the cluster:

$ accumulo-cluster create-config

Finally, we’re ready to start the cluster:

$ accumulo-cluster start

Once started, we can run the Accumulo shell – a command-line tool for interacting with Apache Accumulo:

$ accumulo shell -u root

Note: This command asks to set the instance name and password in the accumulo-client.properties.

Accumulo Shell provides basic commands to manage, query, and perform administrative tasks on tables and instances.

Let’s take a look at a few commands that are most handy:

  • tables: lists all tables in the instance
  • createtable <table>: creates a new table
  • deletetable <table>: deletes a table
  • scan: scans and displays data from the current table
  • insert <row> <colfam> <colqual> <value>: inserts a value into the table
  • delete <row> <colfam> <colqual>: deletes a specific entry from the table
  • setiter -t <table>: sets a table-specific iterator
  • listiter [-scan | -table]: lists the iterators for a scanner or a table
  • createuser <username>: creates a new user
  • info: displays system information about the Accumulo instance
  • config: views or changes configuration settings
  • flush <table>: forces a flush of memory to disk for a table
  • compact <table>: compacts the table’s data

4. Data Model

The Accumulo data model is similar to Google’s Bigtable, providing a sparse, distributed, persistent multi-dimensional sorted map.

Specifically, the key of the Accumulo instance consists of three components (helping it to be unique for each value stored):

  • Row ID: The primary identifier for a row of data, used for lexicographical sorting of data
  • Column:
    • Family: Columns are grouped into families, which act as categories or namespaces for the data. Column families provide a way to organize related data.
    • Qualifier: Within each column family, individual columns are identified by a column qualifier. This allows for fine-grained differentiation of data within a column family.
    • Visibility: Each key-value pair can be associated with a security label or visibility. This allows for cell-level access control, where users must have the appropriate authorizations to read the data.
  • TimeStamp: A version number associated with each key-value pair, allowing Accumulo to store multiple versions of the same data

Overall, the Accumulo data model provides a flexible and secure framework for managing large-scale, structured datasets with intricate security needs.

Its use of row IDs, column families, and qualifiers enables robust data organization and querying, while cell-level visibility controls ensure the protection of sensitive information.

5. Operations and Features

5.1. Basic Table Operations

Accumulo offers robust capabilities for managing tables. We can create new tables as needed, clone existing tables for testing or development purposes, and split large tables into smaller tablets for performance optimization.

Additionally, tables can be merged to consolidate data and improve query efficiency. Accumulo also supports flexible data import and export operations, enabling seamless data migration and integration with other systems.

5.2. Data Handling

Accumulo provides fundamental data manipulation to create, update, and delete data. For efficient handling of large datasets, Accumulo offers batch operations, allowing for the bulk processing of data.

Furthermore, range-based scans enable efficient retrieval of specific data subsets, optimizing query performance.

5.3. Security Features

Accumulo provides cell-level security by setting security labels for every piece of data. We can set up complex security rules using boolean expressions and manage user access to enforce fine-grained authorization policies.

5.4. Iterator Framework

Accumulo provides powerful Iterators that act as on-the-spot data processors, working directly where the data resides. They handle filtering, aggregating, and transforming data on the server itself, so we don’t need to send large amounts of raw data over the network.

This results in faster query processing, greater efficiency, and reduced network traffic.

5.5. Performance Optimizations

Accumulo incorporates various performance optimizations like write-ahead logging, memory-based writing, Bloom filters, and Locality groups to ensure efficient data storage and retrieval.

Write-ahead logging guarantees data durability, while memory-based writing accelerates data ingestion. Bloom filters enable fast lookups, reducing the need for full table scans. Locality groups optimize data placement, improving read and write performance.

5.6. Scaling and Distribution

Accumulo automatically splits tablets and balances the load when more data is added, and integrating new machines into the cluster is as simple as pointing them to it. The system manages data distribution smoothly as the data expands.

5.7. Real-Time Insights

Accumulo provides real-time insights by allowing us to monitor performance metrics, track resource usage, and detect issues as they arise.

With its efficient data processing capabilities and integration with monitoring tools, we can quickly respond to changes and ensure optimal system performance.

5.8. Administration

Accumulo offers robust administrative capabilities, including reliable backup and recovery mechanisms, intelligent data compaction strategies, and flexible system configuration options.

It also provides benefits like comprehensive user management and resource control, ensuring secure access and optimal performance.

6. Accumulo Clients

Now that we’ve covered Accumulo’s installation process, data model, operations, and features, let’s explore Accumulo Clients to interact with Accumulo through Java API.

The Accumulo Client API allows us to perform administrative tasks, query data, and manage tables programmatically.

6.1. Maven Dependency

First, let’s add the latest accumulo-core Maven dependency to our pom.xml:

<dependency>
    <groupId>org.apache.accumulo</groupId>
    <artifactId>accumulo-core</artifactId>
    <version>2.1.3</version>
</dependency>

This dependency adds the necessary classes and methods to work with Accumulo.

6.2. Create the Accumulo Client

Next, let’s create a client to interact with Accumulo:

AccumuloClient client = Accumulo.newClient()
  .to("accumuloInstanceName", "localhost:2181")
  .as("username", "password").build();

We’ve used the builder method to initialize the connection by specifying the Accumulo’s instance name, ZooKeeper host details, username, and password of the Accumulo instance.

6.3. Basic Operations

Next, with the client set up, let’s perform the basic operation of creating a table:

client.tableOperations().create(tableName);

Then, to add data to the table, we can use the BatchWriter class that offers high-performance, batch-oriented writes:

try (BatchWriter writer = client.createBatchWriter(tableName, new BatchWriterConfig())) {
    Mutation mutation1 = new Mutation("row1");
    mutation1.at()
      .family("column family 1")
      .qualifier("column family 1 qualifier 1")
      .visibility("public").put("value 1");

    Mutation mutation2 = new Mutation("row2");
    mutation2.at()
      .family("column family 1")
      .qualifier("column family 1 qualifier 2")
      .visibility("private").put("value 2");

    writer.addMutation(mutation1);
    writer.addMutation(mutation2);
}

Here, each entry is represented by the Mutation object that accepts column info like family, qualifier, and visibility as discussed previously in the data model.

Similarly, let’s retrieve data from the table using the Scanner class:

try (var scanner = client.createScanner(tableName, new Authorizations("public"))) {
    scanner.setRange(new Range());
    for (Map.Entry<Key, Value> entry : scanner) {
        System.out.println(entry.getKey() + " -> " + entry.getValue());
    }
}

Here, we iterate over rows within a specified range that scans the entire table, and we apply filters like authorizations, ensuring only publicly visible data is fetched.

7. Conclusion

In this tutorial, we’ve discussed Apache Accumulo, a versatile, scalable database that excels in handling massive datasets with complex access requirements.

Its unique features, such as cell-level security, iterators, and flexible data models, make it an excellent choice for applications requiring secure and efficient data management for real-time analytics, secure data processing, or large-scale data storage.

First, we explored the steps for installation and setup. Then, we educated ourselves with its unique data model. Last, we familiarized ourselves with the available operations and features.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Course – Black Friday 2025 – NPI EA (cat= Baeldung)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

Partner – Orkes – NPI EA (cat = Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag = Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

Course – Black Friday 2025 – NPI (All)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

eBook Jackson – NPI EA – 3 (cat = Jackson)