Configuring Multiple LLMs in Spring AI

Last updated: October 4, 2025

Written by: Hardik Singh Behl

Reviewed by: Kevin Gilmore

LLM
OpenAI

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

Modern applications are increasingly integrating with Large Language Models (LLMs) to build intelligent solutions. While a single LLM can handle multiple tasks, relying on just one model isn’t always the optimal approach.

Different models specialize in different capabilities, with some excelling at technical analysis while others perform better at creative writing. Additionally, we might prefer lighter and cost-effective models for handling simple tasks while reserving powerful models for complex tasks.

In this tutorial, we’ll explore integrating multiple LLMs in a Spring Boot application using Spring AI.

We’ll configure models from different providers as well as multiple models from the same provider. Then, we’ll build upon this configuration to implement a resilient chatbot capable of automatically switching between models when failures occur.

2. Configuring LLMs of Different Providers

Let’s start by configuring two LLMs from different providers in our application.

For our demonstration, we’ll use OpenAI and Anthropic as our AI model providers.

2.1. Configuring a Primary LLM

We’ll begin by configuring an OpenAI model as our primary LLM.

First, let’s add the necessary dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
    <version>1.0.2</version>
</dependency>

The OpenAI starter dependency is a wrapper around OpenAI’s Chat Completions API, enabling us to interact with OpenAI models in our application.

Next, let’s configure our OpenAI API key and chat model in the application.yaml file:

spring:
  ai:
    open-ai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: ${PRIMARY_LLM}
          temperature: 1

We use the ${} property placeholder to load the values of our properties from environment variables. Additionally, we set the temperature to 1 since the newer OpenAI models only accept this default value.

On configuring the above properties, Spring AI automatically creates a bean of type OpenAiChatModel. Let’s use it to define a ChatClient bean, which serves as the main entry point for interacting with our LLM:

@Configuration
class ChatbotConfiguration {

    @Bean
    @Primary
    ChatClient primaryChatClient(OpenAiChatModel chatModel) {
        return ChatClient.create(chatModel);
    }
}

In our ChatbotConfiguration class, we use the OpenAiChatModel bean to create a ChatClient bean for our primary LLM.

We annotate this bean with @Primary. Spring Boot will automatically inject it in our components when we autowire a ChatClient bean without using a qualifier.

2.2. Configuring a Secondary LLM

Now, let’s configure a model from Anthropic to act as our secondary LLM.

First, let’s add the Anthropic starter dependency to our pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-anthropic</artifactId>
    <version>1.0.2</version>
</dependency>

This dependency is a wrapper around the Anthropic Message API and provides us with the necessary classes to establish a connection and interact with Anthropic models.

Next, let’s define the configuration properties for our secondary model:

spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: ${SECONDARY_LLM}

Similar to our primary LLM configuration, we load the Anthropic API key and the model ID from environment variables.

Finally, let’s create a dedicated ChatClient bean for our secondary model:

@Bean
ChatClient secondaryChatClient(AnthropicChatModel chatModel) {
    return ChatClient.create(chatModel);
}

Here, we create a secondaryChatClient bean using the AnthropicChatModel bean that Spring AI auto-configures for us.

3. Configuring LLMs of Same Provider

More often than not, the LLMs we need to configure might belong to the same AI provider.

Spring AI does not natively support this scenario, since the auto-configuration creates only one ChatModel bean per provider. We’ll need to manually define a ChatModel bean for the additional model(s).

Let’s explore this process and configure a second Anthropic model in our application:

spring:
  ai:
    anthropic:
      chat:
        options:
          tertiary-model: ${TERTIARY_LLM}

In our application.yaml, under the Anthropic configuration, we’ve added a custom property to hold the model name for the tertiary LLM.

Next, let’s define the necessary beans for our tertiary LLM:

@Bean
ChatModel tertiaryChatModel(
    AnthropicApi anthropicApi,
    AnthropicChatModel anthropicChatModel,
    @Value("${spring.ai.anthropic.chat.options.tertiary-model}") String tertiaryModelName
) {
    AnthropicChatOptions chatOptions = anthropicChatModel.getDefaultOptions().copy();
    chatOptions.setModel(tertiaryModelName);
    return AnthropicChatModel.builder()
      .anthropicApi(anthropicApi)
      .defaultOptions(chatOptions)
      .build();
}

@Bean
ChatClient tertiaryChatClient(@Qualifier("tertiaryChatModel") ChatModel tertiaryChatModel) {
    return ChatClient.create(tertiaryChatModel);
}

First, to create our custom ChatModel bean, we inject the auto-configured AnthropicApi bean, the default AnthropicChatModel bean which we used to create our secondary LLM’s ChatClient, and the tertiary model name property using @Value.

We copy the default options from the existing AnthropicChatModel bean and simply override the model name.

This setup assumes that both Anthropic models share the same API key and other configurations. If we need to specify different properties, we can customize the AnthropicChatOptions further.

Finally, we use the custom tertiaryChatModel to create a third ChatClient bean in our configuration class.

4. Exploring a Practical Use Case

With our multi-model configuration in place, let’s implement a practical use case. We’ll build a resilient chatbot that automatically falls back to alternative models in sequence when the primary one fails.

4.1. Building a Resilient Chatbot

To implement the fallback logic, we’ll use Spring Retry.

Let’s create a new ChatbotService class and autowire the three ChatClient beans we’ve defined. Then, let’s define the entry point for our chatbot that uses the primary LLM:

@Retryable(retryFor = Exception.class, maxAttempts = 3)
String chat(String prompt) {
    logger.debug("Attempting to process prompt '{}' with primary LLM. Attempt #{}",
        prompt, RetrySynchronizationManager.getContext().getRetryCount() + 1);
    return primaryChatClient
      .prompt(prompt)
      .call()
      .content();
}

Here, we create a chat() method that uses the primaryChatClient bean. We annotate this method with @Retryable, configuring it to make up to three attempts in case of any Exception.

Next, let’s define a recovery method:

@Recover
String chat(Exception exception, String prompt) {
    logger.warn("Primary LLM failure. Error received: {}", exception.getMessage());
    logger.debug("Attempting to process prompt '{}' with secondary LLM", prompt);
    try {
        return secondaryChatClient
          .prompt(prompt)
          .call()
          .content();
    } catch (Exception e) {
        logger.warn("Secondary LLM failure: {}", e.getMessage());
        logger.debug("Attempting to process prompt '{}' with tertiary LLM", prompt);
        return tertiaryChatClient
          .prompt(prompt)
          .call()
          .content();
    }
}

The @Recover annotation marks our overloaded chat() method as the fallback if the original chat() method fails and exhausts the configured retries.

We first attempt to get a response from the secondaryChatClient. If that fails as well, we make a final attempt with the tertiaryChatClient bean.

We use this rudimentary try-catch implementation since Spring Retry only allows one recovery method per method signature. However, in a production application, we should consider a more sophisticated solution like Resilience4j.

Now that we’ve implemented our service layer, let’s expose a REST API on top of it:

@PostMapping("/api/chatbot/chat")
ChatResponse chat(@RequestBody ChatRequest request) {
    String response = chatbotService.chat(request.prompt);
    return new ChatResponse(response);
}

record ChatRequest(String prompt) {}
record ChatResponse(String response) {}

Here, we define a POST /api/chatbot/chat, which accepts a prompt, passes it to the service layer, and finally, wraps and returns the response in a ChatResponse record.

4.2. Testing Our Chatbot

Finally, let’s test our chatbot to verify that the fallback mechanism works correctly.

Let’s start our application with environment variables that specify invalid model names for our primary and secondary LLMs, but a valid one for the tertiary LLM:

OPENAI_API_KEY=.... \
ANTHROPIC_API_KEY=.... \
PRIMARY_LLM=gpt-100 \
SECONDARY_LLM=claude-opus-200 \
TERTIARY_LLM=claude-3-haiku-20240307 \
mvn spring-boot:run

In our command, gpt-100 and claude-opus-200 are invalid model names that will cause API errors, while claude-3-haiku-20240307 is a valid model from Anthropic.

Next, let’s use the HTTPie CLI to invoke the API endpoint and interact with our chatbot:

http POST :8080/api/chatbot/chat prompt="What is the capital of France?"

Here, we send a simple prompt to the chatbot, let’s see what we receive as a response:

{
    "response": "The capital of France is Paris."
}

As we can see, despite configuring invalid primary and secondary LLMs, our chatbot still provides a correct response, confirming that the system falls back to the tertiary LLM.

To see the fallback logic in action, let’s also examine our application logs:

[2025-09-30 12:56:03] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with primary LLM. Attempt #1
[2025-09-30 12:56:05] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with primary LLM. Attempt #2
[2025-09-30 12:56:06] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with primary LLM. Attempt #3
[2025-09-30 12:56:07] [WARN] [com.baeldung.multillm.ChatbotService] - Primary LLM failure. Error received: HTTP 404 - {
    "error": {
        "message": "The model `gpt-100` does not exist or you do not have access to it.",
        "type": "invalid_request_error",
        "param": null,
        "code": "model_not_found"
    }
}
[2025-09-30 12:56:07] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with secondary LLM
[2025-09-30 12:56:07] [WARN] [com.baeldung.multillm.ChatbotService] - Secondary LLM failure: HTTP 404 - {"type":"error","error":{"type":"not_found_error","message":"model: claude-opus-200"},"request_id":"req_011CTeBrAY8rstsSPiJyv3sj"}
[2025-09-30 12:56:07] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with tertiary LLM

The logs clearly illustrate our request’s execution flow.

We see three failed attempts with the primary LLM. Then, our service attempts to use the secondary LLM, which also fails. Finally, it invokes the tertiary LLM that processes the prompt and returns the response we saw.

This demonstrates that our fallback mechanism works exactly as designed, ensuring our chatbot remains available even when multiple LLMs fail.

5. Conclusion

In this article, we’ve explored integrating multiple LLMs within a single Spring AI application.

First, we demonstrated how Spring AI’s abstraction layer simplifies configuring models from different providers like OpenAI and Anthropic.

Then, we tackled a more complex scenario of configuring multiple models from the same provider, creating custom beans when Spring AI’s auto-configuration isn’t sufficient.

Finally, we used this multi-model configuration to build a resilient and highly available chatbot. Using Spring Retry, we configured a cascading fallback pattern that automatically switches between LLMs in case of failure.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.