Let's get started with a Microservice Architecture with Spring Cloud:
Introduction to simple-openai
Last updated: January 26, 2026
1. Overview
The simple-openai library is a unified, community-driven Java HTTP client for the OpenAI API and compatible providers. It wraps low-level REST calls in a consistent, type-safe API covering chat completions, tools, vision, structured outputs, embeddings, and several newer OpenAI features.
Using this library instead of provider-specific SDKs keeps almost all of the application code independent from a specific model vendor. The library targets the standard OpenAI API, as well as Gemini Google, Gemini Vertex, Mistral, DeepSeek, Azure OpenAI, and Anyscale. Not all providers, however, expose the same set of features through simple-openai.
In this tutorial, we’ll build a set of small console applications that communicate with a live LLM via simple-openai. For the examples, we’ll use Google Gemini to work with a free API key while still following patterns applicable to OpenAI.
Notably, we’ll use Java 21. However, the minimum supported version is Java 11.
2. Project Setup and Library Configuration
To prepare a minimal Java starter project, we’re going to add the dependencies for working with simple-openai and Gemini.
2.1. Maven Dependencies
Let’s start with a plain Maven project. The only required library is simple-openai:
<dependencies>
<dependency>
<groupId>io.github.sashirestela</groupId>
<artifactId>simple-openai</artifactId>
<version>3.22.2</version>
</dependency>
</dependencies>
This version is the most recent one at the time of writing. However, we should always check the latest release on Maven Repository before adding the dependency to a new project. In fact, this is especially important to ensure compatibility with the latest API changes.
2.2. Gemini API Key and curl Tests
To run the examples with Gemini, we need an API key from Google AI Studio. We can get a free one fairly easily. However, the AI model used in the code examples here is capped at 20 requests per day on the free tier.
After creating the key, let’s store it in an environment variable named GEMINI_API_KEY. We can define it either in the run configuration of the IDE or in a shell script that launches the IDE, so that all examples can read it at runtime.
As a quick sanity check before moving on, let’s log whether the variable is visible to the JVM:
Logger logger = System.getLogger("simpleopenai");
logger.log(Level.INFO,
"GEMINI_API_KEY configured: {0}",
System.getenv("GEMINI_API_KEY") != null);
The next essential step is to verify that the key and local environment can successfully access the Gemini endpoints. For this purpose, we create the gemini-curl-tests.txt file, which contains a set of curl tests that cover all the Gemini OpenAI-compatible features.
2.3. Configuring the Client
The simple-openai library has one main client class for each provider. For instance, SimpleOpenAI targets the standard OpenAI API, and SimpleOpenAIGeminiGoogle targets the Gemini Google API. Each client knows how to communicate with the corresponding HTTP endpoints and provides the same high-level entry points, such as the chat completion service, depending on what the provider supports.
To avoid repeating the client configuration in each example, let’s centralize it in a small helper class that also holds the model name and a shared logger:
public final class Client {
public static final Logger LOGGER = System.getLogger("simpleopenai");
public static final String CHAT_MODEL = "gemini-2.5-flash";
private Client() {
}
public static SimpleOpenAIGeminiGoogle getClient() {
String apiKey = System.getenv("GEMINI_API_KEY");
if (apiKey == null || apiKey.isBlank()) {
throw new IllegalStateException("GEMINI_API_KEY is not set");
}
return SimpleOpenAIGeminiGoogle.builder()
.apiKey(apiKey)
.build();
}
}
For all of the upcoming examples, we assume that the client is declared using var to prevent the code from depending on a concrete provider class:
var client = Client.getClient();
This makes sense because the different provider classes share the same public method signatures for the services they have in common, such as chatCompletions().
If we later decide to switch from Gemini to OpenAI or another provider that offers the same services, we only need to update the implementation of getClient() and the model constant in this class, and adjust the environment variable name.
3. A Console Chat Client With simple-openai
To confirm that the Client helper, the environment, and the Gemini chat endpoint all work together correctly, we start with a basic example.
3.1. Single-Turn Chat Completion
Let’s create a small console application that sends a single user question to the model and waits for a response. In this case, the code is quite simple and minimal:
ChatRequest chatRequest = ChatRequest.builder()
.model(Client.CHAT_MODEL)
.message(UserMessage.of(
"Suggest a weekend trip in Japan, no more than 60 words."
))
.build();
CompletableFuture<Chat> chatFuture =
client.chatCompletions().create(chatRequest);
Chat chat = chatFuture.join();
Client.LOGGER.log(Level.INFO, "Model reply: {0}", chat.firstContent());
Let’s see an example response:
Model reply: Escape Tokyo to **Hakone** for a rejuvenating weekend! [...]
The model may return Markdown-formatted text, so we can expect console output to include markers like **bold** or lists.
Now, we’re ready to create a real chatbot.
3.2. Keeping Conversation State in Java
To transform the one-turn example into a basic console chatbot, we just store the conversation history in Java and send it with each request.
To make it easier to understand how it works, let’s replace the code for obtaining and printing the assistant’s responses with a comment for now:
List<ChatMessage> history = new ArrayList<>();
history.add(SystemMessage.of(
"You are a helpful travel assistant. Answer briefly."
));
try (Scanner scanner = new Scanner(System.in)) {
while (true) {
System.out.print("You: ");
String input = scanner.nextLine();
if (input == null || input.isBlank()) {
continue;
}
if ("exit".equalsIgnoreCase(input.trim())) {
break;
}
history.add(UserMessage.of(input));
ChatRequest.ChatRequestBuilder chatRequestBuilder =
ChatRequest.builder().model(Client.CHAT_MODEL);
for (ChatMessage message : history) {
chatRequestBuilder.message(message);
}
ChatRequest chatRequest = chatRequestBuilder.build();
// the next snippet goes here: it shows how to obtain the String "reply"
history.add(AssistantMessage.of(reply));
}
}
Although the initial SystemMessage before the while loop is optional, it plays an important role in controlling the style of responses, the role of the assistant, and its level of detail and personality.
Let’s also see the code for obtaining and printing the assistant’s responses:
CompletableFuture<Chat> chatFuture =
client.chatCompletions().create(chatRequest);
Chat chat = chatFuture.join();
String reply = chat.firstContent().toString();
Client.LOGGER.log(Level.INFO, "Assistant: {0}", reply);
Critically, this part is in a separate snippet, because only it changes when we discuss streaming.
Exchanging a few messages may result in an output like this:
You: How can I get from New York to Tokyo?
Assistant: Fly.
You: Where can I stay overnight?
Assistant: Hotels, hostels, motels, or Airbnbs.
You: exit
The responses are very short because we asked for brief answers in the SystemMessage. However, if we had asked for longer replies, streaming would have made more sense.
3.3. Switching to Streaming Responses
For longer answers, it’s often preferable to display the output progressively as it arrives. With simple-openai, we can switch from create() to createStream(), which returns a Stream<Chat> of incremental chunks.
Let’s see the new code for obtaining and printing the assistant’s responses:
CompletableFuture<Stream<Chat>> chatStreamFuture =
client.chatCompletions().createStream(chatRequest);
Stream<Chat> chatStream = chatStreamFuture.join();
StringBuilder replyBuilder = new StringBuilder();
chatStream.forEach(chunk -> {
String content = chunk.firstContent();
replyBuilder.append(content);
System.out.print(content);
});
String reply = replyBuilder.toString();
Before testing it, let’s change the initial SystemMessage to ask for more detailed responses:
history.add(SystemMessage.of(
"You are a helpful travel assistant. Answer in at least 150 words."
));
These changes permit us to observe behavior that is very similar to that of the most well-known chatbots, such as ChatGPT. We slowed down this video so that the streaming in the console output would be clearly visible:
This kind of conversation is acceptable for a toy chatbot, but for an actual travel company, it’s almost useless. Real applications need the AI model to talk to internal systems so that it can return concrete options, prices, and availability from the travel agency, instead of generic advice. Thus, we’re going to address this gap.
4. Calling Functions: A Multilingual Hotel Booking Assistant
Tool calling is how an AI model accesses internal systems, requests structured operations from the custom Java code, and continues the conversation using the returned data.
This is also where multilingual support becomes critical for business scenarios. Even if the system instructions and internal data are in English, users can typically interact in their own language and still receive answers in that language, as long as the underlying model supports it.
To make this behavior explicit and more portable across models, we can add a line such as Reply in the same language as the user to the initial SystemMessage.
At this point, we can create a hotel booking assistant. The full implementation is lengthy and requires skills beyond the scope of simple-openai, but you can find this in the GitHub project. HotelService contains a small in-memory database and two methods, one for searching and one for booking. HotelFunctions exposes those methods as tools via FunctionExecutor. HandlingToolCallsInTheChatLoop implements the chat loop that detects tool calls, executes them, and feeds results back to the model
That said, let’s move on to the code parts that are specific to simple-openai.
4.1. Fake Inventory and Pricing
To make it easy to validate the behavior, HotelService starts from a fixed list of hotels:
this.inventory = new ArrayList<>(List.of(
new Hotel("HTL-001", "Sakura Central Hotel", "Tokyo", 170, 2),
new Hotel("HTL-002", "Asakusa Riverside Inn", "Tokyo", 130, 3),
new Hotel("HTL-003", "Shinjuku Business Stay", "Tokyo", 110, 2),
new Hotel("HTL-004", "Gion Garden Hotel", "Kyoto", 160, 2),
new Hotel("HTL-005", "Kyoto Station Plaza", "Kyoto", 120, 3),
new Hotel("HTL-006", "Dotonbori Lights Hotel", "Osaka", 140, 2)
));
The offers returned by searchOffers() use a simple pricing rule: the base price is per night wtih a small surcharge for each additional guest. This makes the per-night prices in the console output easy to cross-check against the inventory.
4.2. Exposing Java Methods as Tools
In simple-openai, we define tools by registering FunctionDef entries in a FunctionExecutor. Each tool points to a class that implements Functional, and the fields of that class become the JSON arguments schema.
Let’s see a minimal version of the registration:
executor.enrollFunction(FunctionDef.builder()
.name("search_hotels")
.description("Search for available hotels given a city, check-in date, nights, and guests")
.functionalClass(SearchHotels.class)
.strict(Boolean.TRUE)
.build());
executor.enrollFunction(FunctionDef.builder()
.name("create_booking")
.description("Create a booking given a hotel id, check-in date, nights, guests, and guest name")
.functionalClass(CreateBooking.class)
.strict(Boolean.TRUE)
.build());
At runtime, the model decides whether to call search_hotels or create_booking. The code remains the source of truth for availability and pricing.
4.3. Handling Tool Calls in the Chat Loop
We attach the tools to the request, send it, and then inspect the model response for tool calls. If tool calls exist, we execute them and append ToolMessage entries to the history, then call the model again.
In essence, the loop is just a few lines:
for (ToolCall toolCall : toolCalls) {
Object result = functionExecutor.execute(toolCall.getFunction());
history.add(ToolMessage.of(toJson(result), toolCall.getId()));
}
Some OpenAI-compatible endpoints are strict about tool call payloads and may reject requests that contain unexpected null fields or missing tool call identifiers. To make the tool loop more robust across providers, the repository code sanitizes tool calls before sending them back to the model.
4.4. Example Run
Once everything is wired together, the assistant can ask follow-up questions when data is missing. It can also reuse context from previous messages to fill in implied parameters instead of asking about them again.
Let’s log every toolCall in the output to clarify what happens behind the scenes:
You: Please find a hotel in Tokyo, check-in 2026-01-10, 7 nights, 2 guests.
[...] Tool call: search_hotels with args:
{"checkIn":"2026-01-10","nights":7,"guests":2,"city":"Tokyo"} [...]
Assistant: I found 3 hotels for you:
* Shinjuku Business Stay: $135 per night, total price $945
* Asakusa Riverside Inn: $155 per night, total price $1085
* Sakura Central Hotel: $195 per night, total price $1365
Which one would you like to book?
You: I would like to book a room at the Sakura Central Hotel.
Assistant: What is your full name?
You: Francesco Galgani
[...] Tool call: create_booking with args:
{"hotelId":"HTL-001","guests":2,"checkIn":"2026-01-10","nights":7,"guestName":"Francesco Galgani"} [...]
Assistant: Thank you, your booking for Sakura Central Hotel is confirmed. Booking ID: BK-1DE60AFC.
You: I also need a room in Kyoto for the same guests for two nights immediately following our stay in Tokyo.
[...] Tool call: search_hotels with args:
{"nights":2,"checkIn":"2026-01-17","guests":2,"city":"Kyoto"} [...]
Assistant: I found two hotels for you in Kyoto:
* Kyoto Station Plaza: $145 per night, total price $290
* Gion Garden Hotel: $185 per night, total price $370
Which one would you like to book?
You: The second one is fine.
[...] Tool call: create_booking with args:
{"checkIn":"2026-01-17","hotelId":"HTL-004","nights":2,"guests":2,"guestName":"Francesco Galgani"} [...]
Assistant: Thank you, your booking for Gion Garden Hotel is confirmed. Booking ID: BK-2BD025D3.
We repeated the tests in different languages to verify that multilingual support works as expected, all while keeping the tool calling logic unchanged.
5. Conclusion
In this article, we built a small set of console-based Java examples that call an OpenAI-compatible API through the simple-openai library. We started with a minimal single-turn chat completion, then kept conversation state to implement a basic chatbot, and finally switched to streaming to display longer answers progressively.
We also integrated tool calling to move beyond generic responses and let the assistant execute structured operations in Java, using a simple hotel booking scenario. This pattern is the foundation for real applications, because it keeps business logic and data inside the systems while the model focuses on language understanding and dialogue.
As always, the full code for this article is available over on GitHub.
















