Sockets are endpoints of a two-way channel that processes use to transfer data. Processes use various socket-layer functions to perform socket operations, such as connecting to a socket address or listening for a new connection. In this tutorial, we’ll look at what socket layer options are and discuss the differences between the socket options SO_REUSEADDR and SO_REUSEPORT.
2. Understanding Socket Implementation
Initially, a socket is created with a call to the socket() function. This function returns a socket descriptor, a unique identifier of a socket. The bind() function allows us to assign a source IP address and source port to the socket. The destination IP address and destination port are set with the connect() function.
Thus, a socket is a five-tuple where the values are the protocol, source IP address, source port, destination IP address, and port, respectively. No two sockets can have the same five values as that would defeat the purpose of a connection between the two endpoints. Let’s test this by connecting to baeldung.com on our web browser.
Since baeldung.com is our destination address, we need to retrieve our destination IP. We do this using nslookup:
$ nslookup baeldung.com Server: 172.16.187.2 Address: 172.16.187.2#53 Non-authoritative answer: Name: baeldung.com Address: 220.127.116.11 Name: baeldung.com Address: 18.104.22.168 Name: baeldung.com Address: 2606:4700:3108::ac42:28f8 Name: baeldung.com Address: 2606:4700:3108::ac42:2b08
We retrieve a set of IPV4 and IPV6 addresses. Subsequently, we’ll look for any of these IP addresses when running the ss command. This command helps us investigate the socket further:
$ ss -t State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 172.16.186.134:52984 22.214.171.124:https
We see that we have a socket that has an established connection. Our machine, in this case, is the source IP address (172.16.186.134), and the source port is 52984. Also, the baeldung server is the destination IP address (126.96.36.199), and HTTPS or 443 is the destination port.
3. What Are Socket Options?
Processes need a way to control sockets. For example, a process may need to send broadcast messages or enable the recording of debugging information. In this case, the values of SO_BROADCAST and SO_DEBUG would change. A process does this by calling the setsockopt() function. The setsockopt function requires five parameters:
- Socket file descriptor
- Option name
- Protocol level
- Option value
- Option length
A socket file descriptor is an identifier for a network socket. It’s important to note that all sockets are file descriptors, but not all file descriptors are sockets. This is because file descriptors can be identifiers for files, pipes, and sockets. The option name indicates the property is set, such as SO_BROADCAST.
In addition, different options exist for different protocol levels. This is why the protocol level is a required parameter. SOL_SOCKET is the protocol level to use when looking at the socket level. Also, we can differentiate the options for a level by looking at the prefix of the option name. For instance, we can tell that SO_DEBUG is on the socket level simply from the first two letters of the option name. TCP_NODELAY is on the TCP protocol level, and IP_DONTFRAG is on the IP protocol level.
The OptionValue parameter is a boolean option that indicates whether a socket option is enabled or not. Lastly, the option length parameter specifies the size of the option value parameter. Let’s look at an example of how a process would enable a socket option:
int enable = 1 setsockopt(socket_fd, SOL_SOCKET, SO_BROADCAST, &enable, sizeof(enable));
4. SO_REUSEADDR vs. SO_REUSEPORT
4.1. What Is SO_REUSEADDR?
The SO_REUSEADDR socket option allows for the reuse of local addresses and ports. The use of SO_REUSEADDR begins from the Linux kernel version 2.4 and upward. The implementation of this socket option differs across operating systems. We’ll be discussing the behavior of this socket option on a BSD operating system.
From a TCP perspective, the end goal is to ensure the reliable transfer of data packets. An issue arises when a process on a given IP address and port suddenly terminates and then restarts again. When a process terminates, its corresponding socket closes. When this happens, the socket enters a state called TIME_WAIT. This is where any rogue packets that haven’t yet found their way to their destination are given time to reach their destination eventually. During this time, the address/port combination bound to the socket isn’t available for use.
When the process terminates and restarts again, it’ll want to reuse the same address/port combination. To allow this to happen, we’d need to explicitly ask for this behavior by enabling the SO_REUSEADDR socket option using setsockopt(). The calling of the setsockopt() function needs to happen before the calling of the bind() function. Moreover, if the enabling of the SO_REUSEADDR socket option doesn’t happen, the restarted process will fail.
Furthermore, this socket option allows wildcard addresses to bind to the same port. Without SO_REUSEADDR, a socket binding to 0.0.0.0:80 and another socket attempting to bind to 10.1.0.3:80 will fail. Since 0.0.0.0 indicates all possible local addresses, there’d be a conflict because 0.0.0.0 includes 10.1.0.3 as well. The kernel interprets this as sockets with the same local address and port combination. Setting the SO_REUSEADDR socket option changes the handling of wildcard addresses. When SO_REUSEADDR is on, there won’t be a conflict with one socket binding to 0.0.0.0:80 and another socket binding to 10.1.0.3:80. This is because the 0.0.0.0:80 IP address is interpreted as a wildcard address, which isn’t identical to the specific local address of 10.1.0.3.
For UDP, this socket option is beneficial for multicasting. A multicast is group communication where packets are transferred to groups of destination IPs simultaneously. SO_REUSEADDR is necessary because more than one socket needs to bind to the UDP port. This ensures that every socket bound to the UDP port receives a message from the source IP.
4.2. What Is SO_REUSEPORT?
Like SO_REUSEADDR, SO_REUSEPORT allows multiple sockets to bind to the same address and port combination. This socket option is fairly recent, with its use beginning from the Linux kernel version 3.9. The rule is that each socket binding to the address and port should also enable the socket option, SO_REUSEPORT.
For example, if socket A doesn’t have SO_REUSEPORT enabled before binding to a local IP and port combination, then no socket can bind to that specific address and port. On the other hand, SO_REUSEADDR doesn’t check if the various sockets binding to the IP/port combination have set a specific socket option.
As mentioned before, when a socket closes, it enters a synchronized state that we know as TIME_WAIT. Another socket won’t be able to use the IP address and port combination of the socket in the TIME_WAIT state unless both sockets have the SO_REUSEPORT option.
In the case of multicasting, the SO_REUSEPORT socket option behaves similarly to SO_REUSEADDR. The difference with SO_REUSEPORT is a limitation of the user. With SO_REUSEPORT, one effective userID should bring about all sockets that share the same IP and port. In fact, this is applicable in both TCP and UDP.
In short, if process A (which spawns socket A) binds to <IP address>:21, then no other process can bind to the same port (21). Any other process that tries to bind to port 21 will fail. Also, SO_REUSEPORT tries to distribute packets evenly into various sockets for both protocols, TCP and UDP. Although sockets can have both SO_REUSEADDR and SO_REUSEPORT on, SO_REUSEPORT will always override the behavior of SO_REUSEADDR.
In this article, we began with understanding how sockets are implemented and what socket options are. We then explored the socket options SO_REUSEADDR and SO_REUSEPORT in depth by discussing how they differ and affect communication in TCP and UDP.