Partner – Orkes – NPI EA (cat=Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag=Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Partner – LambdaTest – NPI EA (cat=Testing)
announcement - icon

Browser testing is essential if you have a website or web applications that users interact with. Manual testing can be very helpful to an extent, but given the multiple browsers available, not to mention versions and operating system, testing everything manually becomes time-consuming and repetitive.

To help automate this process, Selenium is a popular choice for developers, as an open-source tool with a large and active community. What's more, we can further scale our automation testing by running on theLambdaTest cloud-based testing platform.

Read more through our step-by-step tutorial on how to set up Selenium tests with Java and run them on LambdaTest:

>> Automated Browser Testing With Selenium

Partner – Orkes – NPI EA (cat=Java)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

Artificial Intelligence is changing the way we build web applications. Hugging Face is a popular platform that provides a vast collection of open-source and pre-trained LLMs.

We can use Ollama, an open-source tool, to run LLMs on our local machines. It supports running GGUF format models from Hugging Face.

In this tutorial, we’ll explore how to use Hugging Face models with Spring AI and Ollama. We’ll build a simple chatbot using a chat completion model and implement semantic search using an embedding model.

2. Dependencies

Let’s start by adding the necessary dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

The Ollama starter dependency helps us to establish a connection with the Ollama service. We’ll use it to pull and run our chat completion and embedding models.

Since the current version, 1.0.0-M5, is a milestone release, we’ll also need to add the Spring Milestones repository to our pom.xml:

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

This repository is where milestone versions are published, as opposed to the standard Maven Central repository.

3. Setting up Ollama With Testcontainers

To facilitate local development and testing, we’ll use Testcontainers to set up the Ollama service.

3.1. Test Dependencies

First, let’s add the necessary test dependencies to our pom.xml:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

We import the Spring AI Testcontainers dependency for Spring Boot and the Ollama module of Testcontainers.

3.2. Defining Testcontainers Bean

Next, let’s create a @TestConfiguration class that defines our Testcontainers beans:

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
    @Bean
    public OllamaContainer ollamaContainer() {
        return new OllamaContainer("ollama/ollama:0.5.4");
    }

    @Bean
    public DynamicPropertyRegistrar dynamicPropertyRegistrar(OllamaContainer ollamaContainer) {
        return registry -> {
            registry.add("spring.ai.ollama.base-url", ollamaContainer::getEndpoint);
        };
    }
}

We specify the latest stable version of the Ollama image when creating the OllamaContainer bean.

Then, we define a DynamicPropertyRegistrar bean to configure the base-url of the Ollama service. This allows our application to connect to the started Ollama container.

3.3. Using Testcontainers During Development

While Testcontainers is primarily used for integration testing, we can use it during local development, too.

To achieve this, we’ll create a separate main class in our src/test/java directory:

public class TestApplication {
    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

We create a TestApplication class and, inside its main() method, start our main Application class with the TestcontainersConfiguration class.

This setup helps us run our Spring Boot application and have it connect to the Ollama service, started via Testcontainers.

4. Using a Chat Completion Model

Now that we’ve got our local Ollama container set up, let’s use a chat completion model to build a simple chatbot.

4.1. Configuring Chat Model and Chatbot Beans

Let’s start by configuring a chat completion model in our application.yaml file:

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: when_missing
      chat:
        options:
          model: hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

To configure a Hugging Face model, we use the format of hf.co/{username}/{repository}. Here, we specify the GGUF version of the Phi-3-mini-4k-instruct model provided by Microsoft.

It’s not a strict requirement to use this model for our implementation. Our recommendation would be to set up the codebase locally and play around with more chat completion models.

Additionally, we set the pull-model-strategy as when_missing. This ensures that Spring AI pulls the specified model if it’s not available locally.

On configuring a valid model, Spring AI automatically creates a bean of type ChatModel, allowing us to interact with the chat completion model.

Let’s use it to define the additional beans required for our chatbot:

@Configuration
class ChatbotConfiguration {
    @Bean
    public ChatMemory chatMemory() {
        return new InMemoryChatMemory();
    }

    @Bean
    public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
        return ChatClient
          .builder(chatModel)
          .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
          .build();
    }
}

First, we define a ChatMemory bean and use the InMemoryChatMemory implementation. This maintains the conversation context by storing the chat history in the memory.

Next, using the ChatMemory and ChatModel beans, we create a bean of type ChatClient, which is our main entry point for interacting with our chat completion model.

4.2. Implementing a Chatbot

With our configurations in place, let’s create a ChatbotService class. We’ll inject the ChatClient bean we defined earlier to interact with our model.

But first, let’s define two simple records to represent the chat request and response:

record ChatRequest(@Nullable UUID chatId, String question) {}

record ChatResponse(UUID chatId, String answer) {}

The ChatRequest contains the user’s question and an optional chatId to identify an ongoing conversation.

Similarly, the ChatResponse contains the chatId and the chatbot’s answer.

Now, let’s implement the intended functionality:

public ChatResponse chat(ChatRequest chatRequest) {
    UUID chatId = Optional
      .ofNullable(chatRequest.chatId())
      .orElse(UUID.randomUUID());
    String answer = chatClient
      .prompt()
      .user(chatRequest.question())
      .advisors(advisorSpec ->
          advisorSpec
            .param("chat_memory_conversation_id", chatId))
      .call()
      .content();
    return new ChatResponse(chatId, answer);
}

If the incoming request doesn’t contain a chatId, we generate a new one. This allows the user to start a new conversation or continue an existing one.

We pass the user’s question to the chatClient bean and set the chat_memory_conversation_id parameter to the resolved chatId to maintain conversation history.

Finally, we return the chatbot’s answer along with the chatId.

4.3. Interacting With Our Chatbot

Now that we’ve implemented our service layer, let’s expose a REST API on top of it:

@PostMapping("/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest chatRequest) {
    ChatResponse chatResponse = chatbotService.chat(chatRequest);
    return ResponseEntity.ok(chatResponse);
}

We’ll use the above API endpoint to interact with our chatbot.

Let’s use the HTTPie CLI to start a new conversation:

http POST :8080/chat question="Who wanted to kill Harry Potter?"

We send a simple question to the chatbot, let’s see what we get as a response:

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Lord Voldemort, also known as Tom Riddle, wanted to kill Harry Potter because of a prophecy that foretold a boy born at the end of July would have the power to defeat him."
}

The response contains a unique chatId and the chatbot’s answer to our question.

Let’s continue this conversation by sending a follow-up question using the chatId from the above response:

http POST :8080/chat chatId="7b8a36c7-2126-4b80-ac8b-f9eedebff28a" question="Who should he have gone after instead?"

Let’s see if the chatbot can maintain the context of our conversation and provide a relevant response:

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Based on the prophecy's criteria, Voldemort could have targeted Neville Longbottom instead, as he was also born at the end of July to parents who had defied Voldemort three times."
}

As we can see, the chatbot does indeed maintain the conversation context as it references the prophecy we discussed in the previous message.

The chatId remains the same, indicating that the follow-up answer is a continuation of the same conversation.

5. Using an Embedding Model

Moving on from the chat completion model, we’ll now use an embedding model to implement semantic search on a small dataset of quotes.

We’ll fetch the quotes from an external API, store them in an in-memory vector store, and perform a semantic search.

5.1. Fetching Quote Records From an External API

For our demonstration, we’ll use the QuoteSlate API to fetch quotes.

Let’s create a QuoteFetcher utility class for this:

class QuoteFetcher {
    private static final String BASE_URL = "https://quoteslate.vercel.app";
    private static final String API_PATH = "/api/quotes/random";
    private static final int DEFAULT_COUNT = 50;

    public static List<Quote> fetch() {
        return RestClient
          .create(BASE_URL)
          .get()
          .uri(uriBuilder ->
              uriBuilder
                .path(API_PATH)
                .queryParam("count", DEFAULT_COUNT)
                .build())
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }
}

record Quote(String quote, String author) {}

Using RestClient, we invoke the QuoteSlate API with the default count of 50 and use ParameterizedTypeReference to deserialize the API response to a list of Quote records.

5.2. Configuring and Populating an In-Memory Vector Store

Now, let’s configure an embedding model in our application.yaml:

spring:
  ai:
    ollama:
      embedding:
        options:
          model: hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF

We use the GGUF version of the nomic-embed-text-v1.5 model provided by nomic-ai. Again, feel free to try this implementation with a different embedding model.

After specifying a valid model, Spring AI automatically creates a bean of type EmbeddingModel for us.

Let’s use it to create a vector store bean:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
    return SimpleVectorStore
      .builder(embeddingModel)
      .build();
}

For our demonstration, we create a bean of SimpleVectorStore class. It’s an in-memory implementation that emulates a vector store using the java.util.Map class.

Now, to populate our vector store with quotes during application startup, we’ll create a VectorStoreInitializer class that implements the ApplicationRunner interface:

@Component
class VectorStoreInitializer implements ApplicationRunner {
    private final VectorStore vectorStore;

    // standard constructor

    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = QuoteFetcher
          .fetch()
          .stream()
          .map(quote -> {
              Map<String, Object> metadata = Map.of("author", quote.author());
              return new Document(quote.quote(), metadata);
          })
          .toList();
        vectorStore.add(documents);
    }
}

In our VectorStoreInitializer, we autowire an instance of VectorStore.

Inside the run() method, we use our QuoteFetcher utility class to retrieve a list of Quote records. Then, we map each quote into a Document and configure the author field as metadata.

Finally, we store all the documents in our vector store. When we invoke the add() method, Spring AI automatically converts our plaintext content into vector representation before storing it in our vector store. We don’t need to explicitly convert it using the EmbeddingModel bean.

With our vector store populated, let’s validate our semantic search functionality:

private static final int MAX_RESULTS = 3;

@ParameterizedTest
@ValueSource(strings = {"Motivation", "Happiness"})
void whenSearchingQuotesByTheme_thenRelevantQuotesReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .builder()
      .query(theme)
      .topK(MAX_RESULTS)
      .build();
    List<Document> documents = vectorStore.similaritySearch(searchRequest);

    assertThat(documents)
      .hasSizeBetween(1, MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("author"));
          assertThat(title)
            .isNotBlank();
      });
}

Here, we pass some common quote themes to our test method using @ValueSource. We then create a SearchRequest object with the theme as the query and MAX_RESULTS as the number of desired results.

Next, we call the similaritySearch() method of our vectorStore bean with the searchRequest. Similar to the add() method of the VectorStore, Spring AI converts our query to its vector representation before querying the vector store.

The returned documents will contain quotes that are semantically related to the given theme, even if they don’t contain the exact keyword.

6. Conclusion

In this article, we’ve explored using Hugging Face models with Spring AI.

Using Testcontainers, we set up the Ollama service, creating a local test environment.

First, we used a chat completion model to build a simple chatbot. Then, we implemented semantic search using an embedding model.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

Partner – Orkes – NPI EA (cat = Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag = Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

eBook Jackson – NPI EA – 3 (cat = Jackson)
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments