Introduction to simple-openai

Last updated: January 26, 2026

Written by: Francesco Galgani

Reviewed by: Hiks Gerganov

Artificial Intelligence

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Overview

The simple-openai library is a unified, community-driven Java HTTP client for the OpenAI API and compatible providers. It wraps low-level REST calls in a consistent, type-safe API covering chat completions, tools, vision, structured outputs, embeddings, and several newer OpenAI features.

Using this library instead of provider-specific SDKs keeps almost all of the application code independent from a specific model vendor. The library targets the standard OpenAI API, as well as Gemini Google, Gemini Vertex, Mistral, DeepSeek, Azure OpenAI, and Anyscale. Not all providers, however, expose the same set of features through simple-openai.

In this tutorial, we’ll build a set of small console applications that communicate with a live LLM via simple-openai. For the examples, we’ll use Google Gemini to work with a free API key while still following patterns applicable to OpenAI.

Notably, we’ll use Java 21. However, the minimum supported version is Java 11.

2. Project Setup and Library Configuration

To prepare a minimal Java starter project, we’re going to add the dependencies for working with simple-openai and Gemini.

2.1. Maven Dependencies

Let’s start with a plain Maven project. The only required library is simple-openai:

<dependencies>
    <dependency>
        <groupId>io.github.sashirestela</groupId>
        <artifactId>simple-openai</artifactId>
        <version>3.22.2</version>
    </dependency>
</dependencies>

This version is the most recent one at the time of writing. However, we should always check the latest release on Maven Repository before adding the dependency to a new project. In fact, this is especially important to ensure compatibility with the latest API changes.

**2.2. Gemini API Key and curl Tests**

To run the examples with Gemini, we need an API key from Google AI Studio. We can get a free one fairly easily. However, the AI model used in the code examples here is capped at 20 requests per day on the free tier.

After creating the key, let’s store it in an environment variable named GEMINI_API_KEY. We can define it either in the run configuration of the IDE or in a shell script that launches the IDE, so that all examples can read it at runtime.

As a quick sanity check before moving on, let’s log whether the variable is visible to the JVM:

Logger logger = System.getLogger("simpleopenai");
logger.log(Level.INFO,
    "GEMINI_API_KEY configured: {0}",
    System.getenv("GEMINI_API_KEY") != null);

The next essential step is to verify that the key and local environment can successfully access the Gemini endpoints. For this purpose, we create the gemini-curl-tests.txt file, which contains a set of curl tests that cover all the Gemini OpenAI-compatible features.

2.3. Configuring the Client

The simple-openai library has one main client class for each provider. For instance, SimpleOpenAI targets the standard OpenAI API, and SimpleOpenAIGeminiGoogle targets the Gemini Google API. Each client knows how to communicate with the corresponding HTTP endpoints and provides the same high-level entry points, such as the chat completion service, depending on what the provider supports.

To avoid repeating the client configuration in each example, let’s centralize it in a small helper class that also holds the model name and a shared logger:

public final class Client {

    public static final Logger LOGGER = System.getLogger("simpleopenai");
    public static final String CHAT_MODEL = "gemini-2.5-flash";

    private Client() {
    }

    public static SimpleOpenAIGeminiGoogle getClient() {
        String apiKey = System.getenv("GEMINI_API_KEY");
        if (apiKey == null || apiKey.isBlank()) {
            throw new IllegalStateException("GEMINI_API_KEY is not set");
        }
        return SimpleOpenAIGeminiGoogle.builder()
            .apiKey(apiKey)
            .build();
    }
}

For all of the upcoming examples, we assume that the client is declared using var to prevent the code from depending on a concrete provider class:

var client = Client.getClient();

This makes sense because the different provider classes share the same public method signatures for the services they have in common, such as chatCompletions().

If we later decide to switch from Gemini to OpenAI or another provider that offers the same services, we only need to update the implementation of getClient() and the model constant in this class, and adjust the environment variable name.

3. A Console Chat Client With simple-openai

To confirm that the Client helper, the environment, and the Gemini chat endpoint all work together correctly, we start with a basic example.

3.1. Single-Turn Chat Completion

Let’s create a small console application that sends a single user question to the model and waits for a response. In this case, the code is quite simple and minimal:

ChatRequest chatRequest = ChatRequest.builder()
    .model(Client.CHAT_MODEL)
    .message(UserMessage.of(
        "Suggest a weekend trip in Japan, no more than 60 words."
    ))
    .build();

CompletableFuture<Chat> chatFuture =
    client.chatCompletions().create(chatRequest);
Chat chat = chatFuture.join();

Client.LOGGER.log(Level.INFO, "Model reply: {0}", chat.firstContent());

Let’s see an example response:

Model reply: Escape Tokyo to **Hakone** for a rejuvenating weekend! [...]

The model may return Markdown-formatted text, so we can expect console output to include markers like **bold** or lists.

Now, we’re ready to create a real chatbot.

3.2. Keeping Conversation State in Java

To transform the one-turn example into a basic console chatbot, we just store the conversation history in Java and send it with each request.

To make it easier to understand how it works, let’s replace the code for obtaining and printing the assistant’s responses with a comment for now:

List<ChatMessage> history = new ArrayList<>();
history.add(SystemMessage.of(
    "You are a helpful travel assistant. Answer briefly."
    ));

try (Scanner scanner = new Scanner(System.in)) {
    while (true) {
        System.out.print("You: ");
        String input = scanner.nextLine();
        if (input == null || input.isBlank()) {
            continue;
        }
        if ("exit".equalsIgnoreCase(input.trim())) {
            break;
        }

        history.add(UserMessage.of(input));

        ChatRequest.ChatRequestBuilder chatRequestBuilder = 
            ChatRequest.builder().model(Client.CHAT_MODEL);

        for (ChatMessage message : history) {
            chatRequestBuilder.message(message);
        }

        ChatRequest chatRequest = chatRequestBuilder.build();

        // the next snippet goes here: it shows how to obtain the String "reply"

        history.add(AssistantMessage.of(reply));
    }
}

Although the initial SystemMessage before the while loop is optional, it plays an important role in controlling the style of responses, the role of the assistant, and its level of detail and personality.

Let’s also see the code for obtaining and printing the assistant’s responses:

CompletableFuture<Chat> chatFuture =
    client.chatCompletions().create(chatRequest);
Chat chat = chatFuture.join();

String reply = chat.firstContent().toString();
Client.LOGGER.log(Level.INFO, "Assistant: {0}", reply);

Critically, this part is in a separate snippet, because only it changes when we discuss streaming.

Exchanging a few messages may result in an output like this:

You: How can I get from New York to Tokyo?
Assistant: Fly.
You: Where can I stay overnight?
Assistant: Hotels, hostels, motels, or Airbnbs.
You: exit

The responses are very short because we asked for brief answers in the SystemMessage. However, if we had asked for longer replies, streaming would have made more sense.

3.3. Switching to Streaming Responses

For longer answers, it’s often preferable to display the output progressively as it arrives. With simple-openai, we can switch from create() to createStream(), which returns a Stream<Chat> of incremental chunks.

Let’s see the new code for obtaining and printing the assistant’s responses:

CompletableFuture<Stream<Chat>> chatStreamFuture =
    client.chatCompletions().createStream(chatRequest);
Stream<Chat> chatStream = chatStreamFuture.join();

StringBuilder replyBuilder = new StringBuilder();

chatStream.forEach(chunk -> {
    String content = chunk.firstContent();
    replyBuilder.append(content);
    System.out.print(content);
});

String reply = replyBuilder.toString();

Before testing it, let’s change the initial SystemMessage to ask for more detailed responses:

history.add(SystemMessage.of(
    "You are a helpful travel assistant. Answer in at least 150 words."
));

These changes permit us to observe behavior that is very similar to that of the most well-known chatbots, such as ChatGPT. We slowed down this video so that the streaming in the console output would be clearly visible:

This kind of conversation is acceptable for a toy chatbot, but for an actual travel company, it’s almost useless. Real applications need the AI model to talk to internal systems so that it can return concrete options, prices, and availability from the travel agency, instead of generic advice. Thus, we’re going to address this gap.

4. Calling Functions: A Multilingual Hotel Booking Assistant

Tool calling is how an AI model accesses internal systems, requests structured operations from the custom Java code, and continues the conversation using the returned data.

This is also where multilingual support becomes critical for business scenarios. Even if the system instructions and internal data are in English, users can typically interact in their own language and still receive answers in that language, as long as the underlying model supports it.

To make this behavior explicit and more portable across models, we can add a line such as Reply in the same language as the user to the initial SystemMessage.

At this point, we can create a hotel booking assistant. The full implementation is lengthy and requires skills beyond the scope of simple-openai, but you can find this in the GitHub project. HotelService contains a small in-memory database and two methods, one for searching and one for booking. HotelFunctions exposes those methods as tools via FunctionExecutor. HandlingToolCallsInTheChatLoop implements the chat loop that detects tool calls, executes them, and feeds results back to the model

That said, let’s move on to the code parts that are specific to simple-openai.

4.1. Fake Inventory and Pricing

To make it easy to validate the behavior, HotelService starts from a fixed list of hotels:

this.inventory = new ArrayList<>(List.of(
    new Hotel("HTL-001", "Sakura Central Hotel", "Tokyo", 170, 2),
    new Hotel("HTL-002", "Asakusa Riverside Inn", "Tokyo", 130, 3),
    new Hotel("HTL-003", "Shinjuku Business Stay", "Tokyo", 110, 2),
    new Hotel("HTL-004", "Gion Garden Hotel", "Kyoto", 160, 2),
    new Hotel("HTL-005", "Kyoto Station Plaza", "Kyoto", 120, 3),
    new Hotel("HTL-006", "Dotonbori Lights Hotel", "Osaka", 140, 2)
));

The offers returned by searchOffers() use a simple pricing rule: the base price is per night wtih a small surcharge for each additional guest. This makes the per-night prices in the console output easy to cross-check against the inventory.

4.2. Exposing Java Methods as Tools

In simple-openai, we define tools by registering FunctionDef entries in a FunctionExecutor. Each tool points to a class that implements Functional, and the fields of that class become the JSON arguments schema.

Let’s see a minimal version of the registration:

executor.enrollFunction(FunctionDef.builder()
    .name("search_hotels")
    .description("Search for available hotels given a city, check-in date, nights, and guests")
    .functionalClass(SearchHotels.class)
    .strict(Boolean.TRUE)
    .build());

executor.enrollFunction(FunctionDef.builder()
    .name("create_booking")
    .description("Create a booking given a hotel id, check-in date, nights, guests, and guest name")
    .functionalClass(CreateBooking.class)
    .strict(Boolean.TRUE)
    .build());

At runtime, the model decides whether to call search_hotels or create_booking. The code remains the source of truth for availability and pricing.

4.3. Handling Tool Calls in the Chat Loop

We attach the tools to the request, send it, and then inspect the model response for tool calls. If tool calls exist, we execute them and append ToolMessage entries to the history, then call the model again.

In essence, the loop is just a few lines:

for (ToolCall toolCall : toolCalls) {
    Object result = functionExecutor.execute(toolCall.getFunction());
    history.add(ToolMessage.of(toJson(result), toolCall.getId()));
}

Some OpenAI-compatible endpoints are strict about tool call payloads and may reject requests that contain unexpected null fields or missing tool call identifiers. To make the tool loop more robust across providers, the repository code sanitizes tool calls before sending them back to the model.

4.4. Example Run

Once everything is wired together, the assistant can ask follow-up questions when data is missing. It can also reuse context from previous messages to fill in implied parameters instead of asking about them again.

Let’s log every toolCall in the output to clarify what happens behind the scenes:

You: Please find a hotel in Tokyo, check-in 2026-01-10, 7 nights, 2 guests.

[...] Tool call: search_hotels with args:
{"checkIn":"2026-01-10","nights":7,"guests":2,"city":"Tokyo"} [...]

Assistant: I found 3 hotels for you:
* Shinjuku Business Stay: $135 per night, total price $945
* Asakusa Riverside Inn: $155 per night, total price $1085
* Sakura Central Hotel: $195 per night, total price $1365 

Which one would you like to book?

You: I would like to book a room at the Sakura Central Hotel.

Assistant: What is your full name?

You: Francesco Galgani

[...] Tool call: create_booking with args:
{"hotelId":"HTL-001","guests":2,"checkIn":"2026-01-10","nights":7,"guestName":"Francesco Galgani"} [...]

Assistant: Thank you, your booking for Sakura Central Hotel is confirmed. Booking ID: BK-1DE60AFC.

You: I also need a room in Kyoto for the same guests for two nights immediately following our stay in Tokyo.

[...] Tool call: search_hotels with args:
{"nights":2,"checkIn":"2026-01-17","guests":2,"city":"Kyoto"} [...]

Assistant: I found two hotels for you in Kyoto:
* Kyoto Station Plaza: $145 per night, total price $290
* Gion Garden Hotel: $185 per night, total price $370

Which one would you like to book?

You: The second one is fine.

[...] Tool call: create_booking with args:
{"checkIn":"2026-01-17","hotelId":"HTL-004","nights":2,"guests":2,"guestName":"Francesco Galgani"} [...]

Assistant: Thank you, your booking for Gion Garden Hotel is confirmed. Booking ID: BK-2BD025D3.

We repeated the tests in different languages to verify that multilingual support works as expected, all while keeping the tool calling logic unchanged.

5. Conclusion

In this article, we built a small set of console-based Java examples that call an OpenAI-compatible API through the simple-openai library. We started with a minimal single-turn chat completion, then kept conversation state to implement a basic chatbot, and finally switched to streaming to display longer answers progressively.

We also integrated tool calling to move beyond generic responses and let the assistant execute structured operations in Java, using a simple hotel booking scenario. This pattern is the foundation for real applications, because it keeps business logic and data inside the systems while the model focuses on language understanding and dialogue.

As always, the full code for this article is available over on GitHub.

REST with Spring Boot

Learn Spring Security

Learn Spring

Learn Spring Data JPA

View All Spring Courses

Learn JUnit

Learn Maven

Learn Hibernate JPA

Learn Mockito

View All Courses

Full Archive

Baeldung Ebooks

About Baeldung