Commits and NRT Search in SolrCloud

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Browser testing is essential if you have a website or web applications that users interact with. Manual testing can be very helpful to an extent, but given the multiple browsers available, not to mention versions and operating system, testing everything manually becomes time-consuming and repetitive.

To help automate this process, Selenium is a popular choice for developers, as an open-source tool with a large and active community. What's more, we can further scale our automation testing by running on theLambdaTest cloud-based testing platform.

Read more through our step-by-step tutorial on how to set up Selenium tests with Java and run them on LambdaTest:

>> Automated Browser Testing With Selenium

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

Solr is one of the most popular Lucene-based search solutions. It’s fast, distributed, robust, flexible and has an active developer community behind it. SolrCloud is the new, distributed version of Solr.

One of its key features here is the near real-time (NRT) search, i.e., documents being available for search as soon as they are indexed.

2. Indexing in SolrCloud

A collection in Solr is made up of multiple shards, and each shard has various replicas. One of the replicas of a shard is selected as the leader for that shard when a collection is created:

When a client tries to index a document, the document is first assigned a shard based on the hash of the id of the document
The client gets the URL of the leader of that shard from zookeeper, and finally, the index request is made to that URL
The shard leader indexes the document locally before sending it to replicas
Once the leader receives an acknowledgment from all active and recovering replicas, it returns confirmation to the indexing client application

When we index a document in Solr, it doesn’t go to the index directly. It’s written in what is called a tlog (transaction log). Solr uses the transaction log to ensure that documents are not lost before they are committed, in case of a system crash.

If the system crashes before the documents in the transaction log are committed, i.e., persisted to disk, the transaction log is replayed when the system comes back up, leading to zero loss of documents.

Every index/update request is logged to the transaction log which continues to grow until we issue a commit.

3. Commits in SolrCloud

A commit operation means finalizing a change and persisting that change on disk. SolrCloud provides two kinds of commit operations viz. a commit and a soft commit.

3.1. Commit (Hard Commit)

A commit or hard commit is one in which Solr flushes all uncommitted documents in a transaction log to disk. The active transaction log is processed, and then a new transaction log file is opened.

It also refreshes a component called a searcher so that the newly committed documents become available for searching. A searcher can be considered as a read-only view of all committed documents in the index.

The commit operation can be done exclusively by the client by calling the commit API:

String zkHostString = "zkServer1:2181,zkServer2:2181,zkServer3:2181/solr";
SolrClient solr = new CloudSolrClient.Builder()
  .withZkHost(zkHostString)
  .build();
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField("id", "123abc");
doc1.addField("date", "14/10/2017");
doc1.addField("book", "To kill a mockingbird");
doc1.addField("author", "Harper Lee");
solr.add(doc1);
solr.commit();

Equivalently, it can be automated as autoCommit by specifying it in the solrconfig.xml file, see section 3.4.

3.2. SoftCommit

Softcommit has been added from Solr 4 onwards, primarily to support the NRT feature of SolrCloud. It’s a mechanism for making documents searchable in near real-time by skipping the costly aspects of hard commits.

During a softcommit, the transaction log is not truncated, it continues to grow. However, a new searcher is opened, which makes the documents since last softcommit visible for searching. Also, some of the top-level caches in Solr are invalidated, so it’s not a completely free operation.

When we specify the maxTime for softcommit as 1000, it means that the document will be available in queries no later than 1 second from the time it got indexed.

This feature grants SolrCloud the power of near real-time searching, as new documents can be made searchable even without committing them. Softcommit can be triggered only as autoSoftCommit by specifying it in solrconfig.xml file, see section 3.4.

3.3. Autocommit and Autosoftcommit

The solrconfig.xml file is one of the most important configuration files in SolrCloud. It is generated at the time of collection creation. To enable autoCommit or autoSoftCommit, we need to update the following sections in the file:

<autoCommit>
  <maxDocs>10000</maxDocs>
  <maxTime>30000</maxTime>
  <openSearcher>true</openSearcher>
</autoCommit>

<autoSoftCommit>
  <maxTime>6000</maxTime>
  <maxDocs>1000</maxDocs>
</autoSoftCommit>

maxTime: The number of milliseconds since the earliest uncommitted update after which the next commit/softcommit should happen.

maxDocs: The number of updates that have occurred since the last commit and after which the next commit/softcommit should happen.

openSearcher: This property tells Solr whether to open a new searcher after a commit operation or not. If it’s true, after a commit, the old searcher is closed, and a new searcher is opened, making the committed document visible for searching, If it’s false, the document won’t be available for searching after commit.

4. Near Real-Time Search

Near Real-Time Searching is achieved in Solr using a combination of commit and softcommit. As mentioned before, when a document is added to Solr, it won’t be visible in search results until it’s committed to the index.

Normal commits are costly, which is why softcommits are useful. But, as softcommit doesn’t persist the documents, we do need to set the autocommit maxTime interval (or maxDocs) to a reasonable value, depending upon the load we are expecting.

4.1. Real-Time Gets

There is another feature provided by Solr which is in-fact real time – the get API. The get API can return us a document that is not even soft committed yet.

It searches directly in the transaction logs if the document is not found in the index. So we can fire a get API call, immediately after the index call returns and we’ll still be able to retrieve the document.

However, like all too-good things, there is a catch here. We need to pass the id of the document in the get API call. Of course, we can provide other filter queries along with the id, but without id, the call doesn’t work:

http://localhost:8985/solr/myCollection/get?id=1234&fq=name:baeldung

5. Conclusion

Solr provides quite a bit of flexibility to us regarding tweaking the NRT capability. To get the best performance out of the server, we need to experiment with the values of commits and softcommits, based upon our use case and expected load.

We shouldn’t keep our commit interval too long, or else our transaction log will grow to a considerable size. We shouldn’t execute our softcommits too frequently though.

It is also advised to do a proper performance testing of our system before we go to production. We should check if the documents are becoming searchable within our desired time interval.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.