Thread per Connection vs Thread per Request

Last updated: April 22, 2025

Written by: Gaetano Piazzolla

Reviewed by: Liam Williams

Threads

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Introduction

In this tutorial, we’ll compare two commonly used server threading models: thread-per-connection and thread-per-request.

First, we’ll define exactly what we mean by “connection” and “request”. Then we’ll implement two socket-based Java web servers following the different paradigms. Finally, we’ll look at some key takeaways.

2. Connection vs Request Threading Model

Let’s start with some concise definitions.

A threading model is a program’s approach to how and when threads are created and synchronized to achieve concurrency and multitasking. To illustrate, we refer to an HTTP connection between a client and a server. We consider a request as a single execution made by the client to the server during an established connection.

When the client needs to communicate with the server, it instantiates a new HTTP-over-TCP connection and starts a new HTTP-over-TCP request. To avoid overhead, if the connection is already present, the client can reuse the same connection to send another request. This mechanism is named Keep-Alive, and it’s been available since HTTP 1.0, and made the default behavior in HTTP 1.1.

Understanding this concept, we can now introduce the two threading models compared in this article.

In the following image, we see how the web server uses a thread for every connection if we’re working with the thread-per-connection paradigm, while when adopting the thread-per-request model, the web server uses a thread per request, irrespective of whether the request is part of an existing connection or not:

Image comparing the two threading models.

In the following sections, we’ll identify the pros and cons of the two approaches and see some code samples using sockets. The examples will be simplified versions of real case scenarios. To keep the code as simple as possible, we’ll avoid introducing optimizations that are extensively used in real-world server architectures (e.g., thread pools).

3. Understanding Thread per Connection

With the thread-per-connection approach, each client connection gets its dedicated thread. The same thread handles all the requests coming from that connection.

Let’s illustrate how the thread-per-connection model works by building a simple Java socket-based server:

public class ThreadPerConnectionServer {
   
   private static final int PORT = 8080;
   
   public static void main(String[] args) {
      try (ServerSocket serverSocket = new ServerSocket(PORT)) {
         logger.info("Server started on port {}", PORT);
            while (!serverSocket.isClosed()) {
               try {
                  Socket newClient = serverSocket.accept();
                  logger.info("New client connected: {}", newClient.getInetAddress());
                  ClientConnection clientConnection = new ClientConnection(newClient);
                  new ThreadPerConnection(clientConnection).start();
               } catch (IOException e) {
                  logger.error("Error accepting connection", e);
               }
            }
         } catch (IOException e) {
            logger.error("Error starting server", e);
         }
      }
   }
}

ClientConnection is a simple wrapper that implements the Closeable interface, and includes both the BufferedReader and the PrintWriter that we’re going to use to read requests and write back responses:

public class ClientConnection implements Closeable {

   // ...

   public ClientConnection(Socket socket) throws IOException {
      this.socket = socket;
      this.reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
      this.writer = new PrintWriter(socket.getOutputStream(), true);
    }

   @Override
   public void close() throws IOException {
      try (Writer writer = this.writer; Reader reader = this.reader; Socket socket = this.socket) {
         // resources all closed when this block exits
      }
   }
}

ThreadPerConnectionServer creates a ServerSocket on port 8080 and repeatedly calls the accept() method, which blocks the execution until a new connection is received.

When a client connects, the server immediately starts a new ThreadPerConnection thread:

public class ThreadPerConnection extends Thread {
   // ...
   
   @Override
   public void run() {
      try (ClientConnection client = this.clientConnection) {
         String request;
         while ((request = client.getReader().readLine()) != null) {
            Thread.sleep(1000); // simulate server doing work
            logger.info("Processing request: {}", request);
            clientConnection.getWriter()
               .println("HTTP/1.1 200 OK - Processed request: " + request);
            logger.info("Processed request: {}", request);
         }
      } catch (Exception e) {
         logger.error("Error processing request", e);
      }
   }
}

This simple implementation reads the input from the client and echoes it back with the response prefix. When there are no more requests incoming from this single connection, the socket is automatically closed, leveraging the try-with-resource syntax. Every connection gets its dedicated thread, while the main thread in the while loop remains free to accept new connections.

The most significant advantage of the thread-per-connection model is its extreme neatness and ease of implementation. If 10 clients create 10 concurrent connections, the web server needs 10 threads to serve them all simultaneously. If the same thread serves the same user, the application can avoid thread context-switching.

4. Understanding Thread per Request

With the thread-per-request model, a different thread is used to handle each request, even if the connection used is persistent.

As with the previous case, let’s see a simplified example of a Java socket-based server adopting the thread-per-request threading model:

public class ThreadPerRequestServer {
   //...
   
   public static void main(String[] args) {
      List<ClientConnection> clientConnections = new ArrayList<ClientConnection>();
      try (ServerSocket serverSocket = new ServerSocket(PORT)) {
         logger.info("Server started on port {}", PORT);
         while (!serverSocket.isClosed()) {
            acceptNewConnections(serverSocket, clientConnections);
            handleRequests(clientConnections);
         }
      } catch (IOException e) {
         logger.error("Server error: {}", e.getMessage());
      } finally {
         closeClientConnections(clientConnections);
      }
   }
}

Here, we maintain a list of clientConnections, rather than just one as we did previously. The server accepts new connections until the server socket is closed, handling all the requests incoming from them. When the server socket is closed, we also need to close every client socket connection still active (if any).

First, let’s define the method to accept new connections:

private static void acceptNewConnections(ServerSocket serverSocket, List<ClientConnection> clientConnections) throws SocketException {
   serverSocket.setSoTimeout(100);
   try {
      Socket newClient = serverSocket.accept();
      ClientConnection clientConnection = new ClientConnection(newClient);
      clientConnections.add(clientConnection);
      logger.info("New client connected: {}", newClient.getInetAddress());
   } catch (IOException ignored) {
      // ignored
   }
}

Theoretically, the method that accepts new connections and the method that handles requests should be executed in two different main threads. In this simple example, in order not to block the only main thread and flow of execution, we need to set up a short socket timeout on the server. If no connections are received in 100ms, we consider that no connections are available, and continue to the next method that is used to handle requests:

private static void handleRequests(List<ClientConnection> clientConnections) throws IOException {
   Iterator<ClientConnection> iterator = clientConnections.iterator();
   while (iterator.hasNext()) {
      ClientConnection client = iterator.next();
      if (client.getSocket().isClosed()) {
         logger.info("Client disconnected: {}", client.getSocket().getInetAddress());
         iterator.remove();
         continue;
      }
      try {
         BufferedReader reader = client.getReader();
         if (reader.ready()) {
            String request = reader.readLine();
            if (request != null) {
               new ThreadPerRequest(client.getWriter(), request).start();
            }
         }
      } catch (IOException e) {
         logger.error("Error reading from client {}", client.getSocket()
            .getInetAddress(), e);
      }
   }
}

In this method, for each connection that contains a new valid request to process, we start a new thread that handles only that single request:

public class ThreadPerRequest extends Thread {

   //...

   @Override
   public void run() {
      try {
         Thread.sleep(1000); // simulate server doing work
         logger.info("Processing request: {}", request);
         writer.println("HTTP/1.1 200 OK - Processed request: " + request);
         logger.info("Processed request: {}", request);
      } catch (Exception e) {
         logger.error("Error processing request: {}", e.getMessage());
      }
   }
}

In ThreadPerRequest, we don’t close the client connection, and we handle just one request. The short-lived thread will then be closed as soon as the request is processed. Please note that in real-case application servers, where a thread pool is used, the thread will not be stopped when the request ends, but it will be reused for another request.

With this threading model, the server might create a lot of threads, with high context-switching between them, but will generally scale better: we don’t have an upper limit on concurrent connections.

5. Comparison Table

The following table compares the two approaches, considering some determinant aspects for server architectures:

Feature	Thread per Connection	Thread per Request
Thread Execution Lifecycle	Long-lived, suspended only when the connection is closed	Short-lived, suspended as soon as the request is processed
Context Switching Load	Low, limited by the number of concurrent connections	High, fast context switch for every request
Scalability	Limited to the number of connections the server can create	Efficient, might scale definitively well.
Suitability	Known number of connections	Varying request volumes

If the maximum number of threads provided by a JVM is N and we adopt thread-per-connection, we’ll have a maximum of N concurrent clients. An additional client needs to wait until one client disconnects, which might take a lot of time. If we adopt thread-per-request, instead, we’ll have a maximum number of N simultaneous requests that can be handled at the same time. An additional request stays enqueued until a request is completed, which usually takes a short amount of time.

Finally, the thread-per-connection model works pretty well if the number of connections is known: the simplicity of implementation and low context-switching make a good impact. When working with a large number of requests in unpredictable bursts, the thread-per-request model should be the one to pick.

6. Conclusion

In this article, we compared two commonly used server threading models. The choice between thread-per-connection and thread-per-request models depends on the application’s specific requirements and expected traffic patterns. In general, thread-per-connection offers simplicity and predictability for a known number of clients, while thread-per-request provides greater scalability and flexibility under variable or high-load conditions.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.