Using Hugging Face Models With Spring AI and Ollama

Last updated: January 2, 2025

Written by: Hardik Singh Behl

Reviewed by: Eric Martin

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Browser testing is essential if you have a website or web applications that users interact with. Manual testing can be very helpful to an extent, but given the multiple browsers available, not to mention versions and operating system, testing everything manually becomes time-consuming and repetitive.

To help automate this process, Selenium is a popular choice for developers, as an open-source tool with a large and active community. What's more, we can further scale our automation testing by running on theLambdaTest cloud-based testing platform.

Read more through our step-by-step tutorial on how to set up Selenium tests with Java and run them on LambdaTest:

>> Automated Browser Testing With Selenium

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

Artificial Intelligence is changing the way we build web applications. Hugging Face is a popular platform that provides a vast collection of open-source and pre-trained LLMs.

We can use Ollama, an open-source tool, to run LLMs on our local machines. It supports running GGUF format models from Hugging Face.

In this tutorial, we’ll explore how to use Hugging Face models with Spring AI and Ollama. We’ll build a simple chatbot using a chat completion model and implement semantic search using an embedding model.

2. Dependencies

Let’s start by adding the necessary dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

The Ollama starter dependency helps us to establish a connection with the Ollama service. We’ll use it to pull and run our chat completion and embedding models.

Since the current version, 1.0.0-M5, is a milestone release, we’ll also need to add the Spring Milestones repository to our pom.xml:

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

This repository is where milestone versions are published, as opposed to the standard Maven Central repository.

3. Setting up Ollama With Testcontainers

To facilitate local development and testing, we’ll use Testcontainers to set up the Ollama service.

3.1. Test Dependencies

First, let’s add the necessary test dependencies to our pom.xml:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

We import the Spring AI Testcontainers dependency for Spring Boot and the Ollama module of Testcontainers.

3.2. Defining Testcontainers Bean

Next, let’s create a @TestConfiguration class that defines our Testcontainers beans:

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
    @Bean
    public OllamaContainer ollamaContainer() {
        return new OllamaContainer("ollama/ollama:0.5.4");
    }

    @Bean
    public DynamicPropertyRegistrar dynamicPropertyRegistrar(OllamaContainer ollamaContainer) {
        return registry -> {
            registry.add("spring.ai.ollama.base-url", ollamaContainer::getEndpoint);
        };
    }
}

We specify the latest stable version of the Ollama image when creating the OllamaContainer bean.

Then, we define a DynamicPropertyRegistrar bean to configure the base-url of the Ollama service. This allows our application to connect to the started Ollama container.

3.3. Using Testcontainers During Development

While Testcontainers is primarily used for integration testing, we can use it during local development, too.

To achieve this, we’ll create a separate main class in our src/test/java directory:

public class TestApplication {
    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

We create a TestApplication class and, inside its main() method, start our main Application class with the TestcontainersConfiguration class.

This setup helps us run our Spring Boot application and have it connect to the Ollama service, started via Testcontainers.

4. Using a Chat Completion Model

Now that we’ve got our local Ollama container set up, let’s use a chat completion model to build a simple chatbot.

4.1. Configuring Chat Model and Chatbot Beans

Let’s start by configuring a chat completion model in our application.yaml file:

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: when_missing
      chat:
        options:
          model: hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

To configure a Hugging Face model, we use the format of hf.co/{username}/{repository}. Here, we specify the GGUF version of the Phi-3-mini-4k-instruct model provided by Microsoft.

It’s not a strict requirement to use this model for our implementation. Our recommendation would be to set up the codebase locally and play around with more chat completion models.

Additionally, we set the pull-model-strategy as when_missing. This ensures that Spring AI pulls the specified model if it’s not available locally.

On configuring a valid model, Spring AI automatically creates a bean of type ChatModel, allowing us to interact with the chat completion model.

Let’s use it to define the additional beans required for our chatbot:

@Configuration
class ChatbotConfiguration {
    @Bean
    public ChatMemory chatMemory() {
        return new InMemoryChatMemory();
    }

    @Bean
    public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
        return ChatClient
          .builder(chatModel)
          .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
          .build();
    }
}

First, we define a ChatMemory bean and use the InMemoryChatMemory implementation. This maintains the conversation context by storing the chat history in the memory.

Next, using the ChatMemory and ChatModel beans, we create a bean of type ChatClient, which is our main entry point for interacting with our chat completion model.

4.2. Implementing a Chatbot

With our configurations in place, let’s create a ChatbotService class. We’ll inject the ChatClient bean we defined earlier to interact with our model.

But first, let’s define two simple records to represent the chat request and response:

record ChatRequest(@Nullable UUID chatId, String question) {}

record ChatResponse(UUID chatId, String answer) {}

The ChatRequest contains the user’s question and an optional chatId to identify an ongoing conversation.

Similarly, the ChatResponse contains the chatId and the chatbot’s answer.

Now, let’s implement the intended functionality:

public ChatResponse chat(ChatRequest chatRequest) {
    UUID chatId = Optional
      .ofNullable(chatRequest.chatId())
      .orElse(UUID.randomUUID());
    String answer = chatClient
      .prompt()
      .user(chatRequest.question())
      .advisors(advisorSpec ->
          advisorSpec
            .param("chat_memory_conversation_id", chatId))
      .call()
      .content();
    return new ChatResponse(chatId, answer);
}

If the incoming request doesn’t contain a chatId, we generate a new one. This allows the user to start a new conversation or continue an existing one.

We pass the user’s question to the chatClient bean and set the chat_memory_conversation_id parameter to the resolved chatId to maintain conversation history.

Finally, we return the chatbot’s answer along with the chatId.

4.3. Interacting With Our Chatbot

Now that we’ve implemented our service layer, let’s expose a REST API on top of it:

@PostMapping("/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest chatRequest) {
    ChatResponse chatResponse = chatbotService.chat(chatRequest);
    return ResponseEntity.ok(chatResponse);
}

We’ll use the above API endpoint to interact with our chatbot.

Let’s use the HTTPie CLI to start a new conversation:

http POST :8080/chat question="Who wanted to kill Harry Potter?"

We send a simple question to the chatbot, let’s see what we get as a response:

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Lord Voldemort, also known as Tom Riddle, wanted to kill Harry Potter because of a prophecy that foretold a boy born at the end of July would have the power to defeat him."
}

The response contains a unique chatId and the chatbot’s answer to our question.

Let’s continue this conversation by sending a follow-up question using the chatId from the above response:

http POST :8080/chat chatId="7b8a36c7-2126-4b80-ac8b-f9eedebff28a" question="Who should he have gone after instead?"

Let’s see if the chatbot can maintain the context of our conversation and provide a relevant response:

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Based on the prophecy's criteria, Voldemort could have targeted Neville Longbottom instead, as he was also born at the end of July to parents who had defied Voldemort three times."
}

As we can see, the chatbot does indeed maintain the conversation context as it references the prophecy we discussed in the previous message.

The chatId remains the same, indicating that the follow-up answer is a continuation of the same conversation.

5. Using an Embedding Model

Moving on from the chat completion model, we’ll now use an embedding model to implement semantic search on a small dataset of quotes.

We’ll fetch the quotes from an external API, store them in an in-memory vector store, and perform a semantic search.

5.1. Fetching Quote Records From an External API

For our demonstration, we’ll use the QuoteSlate API to fetch quotes.

Let’s create a QuoteFetcher utility class for this:

class QuoteFetcher {
    private static final String BASE_URL = "https://quoteslate.vercel.app";
    private static final String API_PATH = "/api/quotes/random";
    private static final int DEFAULT_COUNT = 50;

    public static List<Quote> fetch() {
        return RestClient
          .create(BASE_URL)
          .get()
          .uri(uriBuilder ->
              uriBuilder
                .path(API_PATH)
                .queryParam("count", DEFAULT_COUNT)
                .build())
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }
}

record Quote(String quote, String author) {}

Using RestClient, we invoke the QuoteSlate API with the default count of 50 and use ParameterizedTypeReference to deserialize the API response to a list of Quote records.

5.2. Configuring and Populating an In-Memory Vector Store

Now, let’s configure an embedding model in our application.yaml:

spring:
  ai:
    ollama:
      embedding:
        options:
          model: hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF

We use the GGUF version of the nomic-embed-text-v1.5 model provided by nomic-ai. Again, feel free to try this implementation with a different embedding model.

After specifying a valid model, Spring AI automatically creates a bean of type EmbeddingModel for us.

Let’s use it to create a vector store bean:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
    return SimpleVectorStore
      .builder(embeddingModel)
      .build();
}

For our demonstration, we create a bean of SimpleVectorStore class. It’s an in-memory implementation that emulates a vector store using the java.util.Map class.

Now, to populate our vector store with quotes during application startup, we’ll create a VectorStoreInitializer class that implements the ApplicationRunner interface:

@Component
class VectorStoreInitializer implements ApplicationRunner {
    private final VectorStore vectorStore;

    // standard constructor

    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = QuoteFetcher
          .fetch()
          .stream()
          .map(quote -> {
              Map<String, Object> metadata = Map.of("author", quote.author());
              return new Document(quote.quote(), metadata);
          })
          .toList();
        vectorStore.add(documents);
    }
}

In our VectorStoreInitializer, we autowire an instance of VectorStore.

Inside the run() method, we use our QuoteFetcher utility class to retrieve a list of Quote records. Then, we map each quote into a Document and configure the author field as metadata.

Finally, we store all the documents in our vector store. When we invoke the add() method, Spring AI automatically converts our plaintext content into vector representation before storing it in our vector store. We don’t need to explicitly convert it using the EmbeddingModel bean.

5.3. Testing Semantic Search

With our vector store populated, let’s validate our semantic search functionality:

private static final int MAX_RESULTS = 3;

@ParameterizedTest
@ValueSource(strings = {"Motivation", "Happiness"})
void whenSearchingQuotesByTheme_thenRelevantQuotesReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .builder()
      .query(theme)
      .topK(MAX_RESULTS)
      .build();
    List<Document> documents = vectorStore.similaritySearch(searchRequest);

    assertThat(documents)
      .hasSizeBetween(1, MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("author"));
          assertThat(title)
            .isNotBlank();
      });
}

Here, we pass some common quote themes to our test method using @ValueSource. We then create a SearchRequest object with the theme as the query and MAX_RESULTS as the number of desired results.

Next, we call the similaritySearch() method of our vectorStore bean with the searchRequest. Similar to the add() method of the VectorStore, Spring AI converts our query to its vector representation before querying the vector store.

The returned documents will contain quotes that are semantically related to the given theme, even if they don’t contain the exact keyword.

6. Conclusion

In this article, we’ve explored using Hugging Face models with Spring AI.

Using Testcontainers, we set up the Ollama service, creating a local test environment.

First, we used a chat completion model to build a simple chatbot. Then, we implemented semantic search using an embedding model.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

REST with Spring Boot

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Learn JUnit

Learn Maven

Learn Hibernate JPA

Learn Mockito

Learn JSON with Jackson

Full Archive

Baeldung Ebooks

About Baeldung

1. Overview

2. Dependencies

3. Setting up Ollama With Testcontainers

3.1. Test Dependencies

3.2. Defining Testcontainers Bean

3.3. Using Testcontainers During Development

4. Using a Chat Completion Model

4.1. Configuring Chat Model and Chatbot Beans

4.2. Implementing a Chatbot

4.3. Interacting With Our Chatbot

5. Using an Embedding Model

5.1. Fetching Quote Records From an External API

5.2. Configuring and Populating an In-Memory Vector Store

5.3. Testing Semantic Search

6. Conclusion