Partner – Orkes – NPI EA (cat=Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag=Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Partner – LambdaTest – NPI EA (cat=Testing)
announcement - icon

Browser testing is essential if you have a website or web applications that users interact with. Manual testing can be very helpful to an extent, but given the multiple browsers available, not to mention versions and operating system, testing everything manually becomes time-consuming and repetitive.

To help automate this process, Selenium is a popular choice for developers, as an open-source tool with a large and active community. What's more, we can further scale our automation testing by running on theLambdaTest cloud-based testing platform.

Read more through our step-by-step tutorial on how to set up Selenium tests with Java and run them on LambdaTest:

>> Automated Browser Testing With Selenium

Partner – Orkes – NPI EA (cat=Java)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

In modern applications that integrate with Large Language Models (LLMs), when users submit similar or rephrased prompts, we end up making redundant calls to the LLM, leading to unnecessary costs and higher latency.

Semantic caching addresses this challenge by storing the user’s query along with the LLM’s response in a vector store. When a new query arrives, we first check the vector store for semantically similar, previously answered questions. If a close match is found, we return the cached response, bypassing the original LLM call entirely.

In this tutorial, we’ll build a semantic caching layer using Spring AI and Redis.

2. Setting up the Project

Before we start implementing our semantic cache layer, we’ll need to include the necessary dependencies and configure our application correctly.

2.1. Configuring an Embedding Model

First, we’ll configure an embedding model that’ll convert natural language text into numeric vectors. For our demonstration, we’ll use an embedding model from OpenAI.

Let’s start by adding the necessary dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
    <version>1.0.3</version>
</dependency>

Here, we import Spring AI’s OpenAI starter dependency, which we’ll use to interact with an embedding model.

Next, let’s configure our OpenAI API key and specify the embedding model in our application.yaml file:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      embedding:
        options:
          model: text-embedding-3-small
          dimensions: 512

We use the ${} property placeholder to load the value of our API key from an environment variable.

Additionally, we specify text-embedding-3-small as our embedding model with 512 dimensions. Alternatively, we can use a different embedding model, as the specific AI model or provider is irrelevant for this demonstration.

On configuring these properties, Spring AI automatically creates a bean of type EmbeddingModel for us.

2.2. Configuring Redis as Vector Store

Next, we’ll need a vector store to save our query embeddings and their corresponding LLM responses. We’ll use Redis for this purpose, but again, we can choose a vector store based on our requirements.

First, let’s add the required dependency:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-redis</artifactId>
    <version>1.0.3</version>
</dependency>

The Redis vector store starter dependency enables us to establish a connection with Redis and interact with it as a vector store.

Now, let’s configure the connection URL to enable our application to connect to the provisioned Redis instance:

spring:
  data:
    redis:
      url: ${REDIS_URL}

We’re again using a property placeholder to load the Redis connection URL from an environment variable. It’s important to note that the URL should follow the format of redis://username:password@hostname:port.

Next, we’ll need to configure a few custom properties for our semantic cache implementation. We’ll store these properties in our project’s application.yaml file and use @ConfigurationProperties to map the values to a record:

@ConfigurationProperties(prefix = "com.baeldung.semantic.cache")
record SemanticCacheProperties(
    Double similarityThreshold,
    String contentField,
    String embeddingField,
    String metadataField
) {}

Here, the similarityThreshold determines how semantically similar a new query must be (on a scale of 0 to 1) to a cached query for it to be considered a match.

The contentField specifies the field name where we’ll store the original natural language query inside our vector store. The embeddingField stores the vector representation of this natural language query, and the metadataField stores the corresponding LLM’s answer.

Now, let’s define the values of these properties in our application.yaml:

com:
  baeldung:
    semantic:
      cache:
        similarity-threshold: 0.8
        content-field: question
        embedding-field: embedding
        metadata-field: answer

We set a similarity threshold of 0.8, ensuring only highly similar queries trigger cache hits. Next, the three field names we’ve chosen clearly indicate what data each field contains.

With our properties defined, let’s create the beans required to interact with our vector store:

@Configuration
@EnableConfigurationProperties(SemanticCacheProperties.class)
class LLMConfiguration {

    @Bean
    JedisPooled jedisPooled(RedisProperties redisProperties) {
        return new JedisPooled(redisProperties.getUrl());
    }

    @Bean
    RedisVectorStore vectorStore(
      JedisPooled jedisPooled,
      EmbeddingModel embeddingModel,
      SemanticCacheProperties semanticCacheProperties
    ) {
        return RedisVectorStore
          .builder(jedisPooled, embeddingModel)
          .contentFieldName(semanticCacheProperties.contentField())
          .embeddingFieldName(semanticCacheProperties.embeddingField())
          .metadataFields(
            RedisVectorStore.MetadataField.text(semanticCacheProperties.metadataField()))
          .build();
    }
}

First, we create a JedisPooled bean that Spring AI uses to communicate with Redis. We pass the connection URL we’ve configured in our application.yaml file using the auto-configured RedisProperties bean.

Next, we define our RedisVectorStore bean, passing the JedisPooled bean and the auto-configured EmbeddingModel bean. Additionally, we use our semanticCacheProperties bean to define our custom field names. The RedisVectorStore bean is the core class that we’ll use in the upcoming section to interact with our vector store.

3. Implementing Semantic Caching

With our configuration in place, let’s build the service responsible for saving to and searching our semantic cache.

3.1. Saving LLM Response to Cache

Let’s first create a method to save LLM responses:

@Service
@EnableConfigurationProperties(SemanticCacheProperties.class)
class SemanticCachingService {

    private final VectorStore vectorStore;
    private final SemanticCacheProperties semanticCacheProperties;

    // standard constructor

    void save(String question, String answer) {
        Document document = Document
          .builder()
          .text(question)
          .metadata(semanticCacheProperties.metadataField(), answer)
          .build();
        vectorStore.add(List.of(document));
    }
}

Here, in our SemanticCachingService class, we define a save() method that takes a natural language question and its corresponding answer as input.

Inside our method, we create a Document object with the question as the main text content and store the answer in the metadata.

Finally, we use the add() method of the autowired vectorStore bean to save the document. The bean automatically generates an embedding for the document’s text — i.e., the question — and stores it alongside the question and answer in the configured semantic cache.

3.2. Performing Semantic Search on Cache

Now, let’s implement the search functionality to retrieve cached responses:

Optional<String> search(String question) {
    SearchRequest searchRequest = SearchRequest.builder()
      .query(question)
      .similarityThreshold(semanticCacheProperties.similarityThreshold())
      .topK(1)
      .build();
    List<Document> results = vectorStore.similaritySearch(searchRequest);

    if (results.isEmpty()) {
        return Optional.empty();
    }

    Document result = results.getFirst();
    return Optional
      .ofNullable(result.getMetadata().get(semanticCacheProperties.metadataField()))
      .map(String::valueOf);
}

Here, in our search() method, we first build a SearchRequest instance. We pass the question as the query, set the similarityThreshold from our properties, and pass 1 to the topK() method to retrieve only the single best match.

Then, we pass our searchRequest to the similaritySearch() method of the vectorStore bean. Again, the bean automatically generates an embedding for the input question behind the scenes and searches our semantic cache for the most similar entry that meets our threshold.

If no similar entry was found, we simply return an empty Optional.

Alternatively, if a match is found, we extract the first Document from the results, extract the answer from its metadata, and return it wrapped in an Optional.

4. Testing Our Implementation

Finally, let’s write a simple test to verify that our semantic cache implementation works correctly:

String question = "How many sick leaves can I take?";
String answer = "No leaves allowed! Get back to work!!";
semanticCachingService.save(question, answer);

String rephrasedQuestion = "How many days sick leave can I take?";
assertThat(semanticCachingService.search(rephrasedQuestion))
    .isPresent()
    .hasValue(answer);

String unrelatedQuestion = "Can I get a raise?";
assertThat(semanticCachingService.search(unrelatedQuestion))
    .isEmpty();

First, we save an original question and answer pair to the vector store using our semanticCachingService.

Then, we search using a rephrased version of the original question. Despite the different wording, our service recognizes the similarity and returns the cached answer.

Finally, we verify that an unrelated question whose semantic meaning is entirely different results in a cache miss.

5. Conclusion

In this article, we’ve explored implementing semantic caching using Spring AI.

We configured an embedding model from OpenAI to convert text into vector representations and set up Redis as a vector store to store and search these embeddings. Then, we built and tested a caching service that saves LLM responses and retrieves them for semantically similar queries, reducing cost and latency.

For our demonstration, we’ve kept things simple. We can find a more advanced example that builds semantic caching on top of a Retrieval-Augmented Generation (RAG) chatbot here.

As always, all the code examples used in this article are available over on GitHub.

Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

Partner – Orkes – NPI EA (cat = Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag = Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

eBook Jackson – NPI EA – 3 (cat = Jackson)
guest
0 Comments
Oldest
Newest
Inline Feedbacks
View all comments