Network Failures Simulation in Linux

1. Overview

In this tutorial, we’ll be looking at simulating some network failures in Linux. Particularly, we’ll achieve the simulation using the tc command-line along with the netem queueing discipline.

2. Network Traffic Control

Network traffic control is a way of managing the network traffic characteristic in the system. Specifically, we mostly control the traffic by manipulating the queueing characteristic of packets in the Linux kernel network stack.

In other words, before packets are sent to the network interface, a scheduler will enqueue the packets. Then, we can apply different rules and logic to alter queueing behavior. As a result, we could affect the network traffic characteristic by configuring these queues.

Generally, there are four aspects when it comes to controlling the network traffic:

Shaping
Scheduling
Classifying
Policing

2.1. Shaping

In network traffic control, shaping refers to the act of manipulating the traffic transfer rate to achieve the desired rate. For instance, we could limit the bandwidth of an interface through shaping.

Additionally, delaying packets before they are being delivered is also a form of shaping.

2.2. Scheduling

A scheduler can rearrange the packets while they are in the queue. In other words, we could give priority to packets that are important by rearranging the queue. As a result, this gives the user a way of implementing quality of service.

2.3. Classifying

Classifying in network traffic control refers to the grouping of packets according to their property. For example, we can group packets into classes based on their destination or source.

We can then attach different control mechanisms to different classes, thereby giving us a more fine-grain control over the treatment of different kinds of packets.

2.4. Policing

Policing is a mechanism in network traffic control that allows or stops packets from traveling to the next step. One policing example would be to drop packets destined to a certain remote host.

3. Queuing Discipline

A queueing discipline (or qdisc for short) is a scheduler that manages the scheduling of packets queue. It is through the qdisc that we’ll be controlling the network traffic. The qdiscs are further categorized into classful or classless qdisc.

A classful qdisc can form a hierarchy of rules through the use of classes. Through these classes, different rules and behavior can be applied. Thus, it offers the possibility of subjecting different classes of packets under different rules.

On the other hand, classless qdisc does not have any concept of class. In other words, the rules of classless qdiscs are applied to every packet that goes through them — they are nondiscriminatory.

In this tutorial, we’ll be focusing on the classless qdisc. Particularly, we’ll be looking at the netem classless qdisc, which is sufficient to simulate a wide range of typical network failure modes.

4. tc

tc is a traffic control command-line tool in Linux. Specifically, it is through the tc command that we configure the traffic control settings in the Linux kernel network stack.

4.1. Installation

Most of the recent Linux distributions have already come with the tc command. If the command is not present in the system, we can obtain it by installing the packages that contain the command.

On Debian based Linux, we can obtain the tc command by installing the iproute2 package using apt-get:

$ sudo apt-get update 
$ sudo apt-get install -y iproute2

On the other hand, we can install the package iproute-tc package using yum package manager in RHEL based Linux (such as CentOS):

$ sudo yum update
$ sudo yum install -y iproute-tc

Once the installation is complete, we can verify the tc command is available by running tc -help:

$ sudo tc -help
Usage:  tc [ OPTIONS ] OBJECT { COMMAND | help }
        tc [-force] -batch filename
where  OBJECT := { qdisc | class | filter | chain |
                    action | monitor | exec }
       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[aw] |
                    -o[neline] | -j[son] | -p[retty] | -c[olor]
                    -b[atch] [filename] | -n[etns] name | -N[umeric] |
                     -nm | -nam[es] | { -cf | -conf } path }

If the help page shows up, we can confirm the installation is successful.

5. Listing the Qdiscs

To view the qdiscs applied on every network interface on our system, we can run tc qdisc show:

$ sudo tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc noqueue 0: dev eth0 root refcnt 2
qdisc noqueue 0: dev eth1 root refcnt 2

From the output, we can see that there are three network interfaces in the system. Additionally, they all have a noqueue qdisc attached.

A noqueue qdisc is a simple qdisc with no classes, scheduler, policing, or rate-limiting. In other words, it simply sends out any packet as soon as it receives it.

Besides that, we can also display the information for just one interface by specifying the interface name at the end of the command. For instance, to display only the information for interface eth0:

$ sudo tc qdisc show dev eth0
qdisc noqueue 0: dev eth0 root refcnt 2

6. Simulating a Fixed Delay

To add a fixed delay to packets, we can use the delay option on netem. Concretely, we can run the tc qdisc add command with the netem delay option.

For example, we can add a fixed delay of 100ms for any packets going through eth0:

$ sudo tc qdisc add dev eth0 root netem delay 100ms
$ sudo tc qdisc list
qdisc noqueue 0: dev lo root refcnt 2
qdisc netem 8003: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc noqueue 0: dev eth1 root refcnt 2

The command above specifies a netem delay option. Additionally, we’ve specified a fixed delay of 100ms.

Then, the tc qdisc add command simply attaches the delay rule to the root level of the eth0 interface. Since we don’t look at classful qdisc in this article, we can always add the qdisc onto the root level of our interface.

To see the qdisc in action, let’s run a ping experiment. Particularly, we’ll attempt to ping google.com and obtain the reading. We can then make a comparison between the readings:

ping -c 5 google.com
PING google.com (172.217.27.238) 56(84) bytes of data.
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=1 ttl=37 time=8.21 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=2 ttl=37 time=10.4 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=3 ttl=37 time=9.63 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=4 ttl=37 time=11.2 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=5 ttl=37 time=8.33 ms

--- google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4008ms
rtt min/avg/max/mdev = 8.210/9.560/11.248/1.171 ms

Before we apply the rules, each ping request takes roughly 9.56ms to complete.

After we add the fixed delay of 100ms, using the command shown earlier in this section, we can see that the completion time is roughly increased by 100ms:

ping -c 5 google.com
PING google.com (172.217.27.238) 56(84) bytes of data.
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=1 ttl=37 time=127 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=2 ttl=37 time=112 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=3 ttl=37 time=109 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=4 ttl=37 time=111 ms
64 bytes from 172.217.27.238 (172.217.27.238): icmp_seq=5 ttl=37 time=111 ms

--- google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 109.429/114.275/127.325/6.579 ms

Once we’re done with the experiment, we’ll remove the qdisc from the interface using the tc qdisc delete command.

Let’s delete the qdisc we’ve attached to the interface eth0:

$ sudo tc qdisc delete dev eth0 root
$ sudo tc qdisc show eth0
qdisc noqueue 0: dev eth0 root refcnt 2

7. Simulating Normally Distributed Delays

Other than fixed delay, netem also offers the possibility of simulating delays according to a distribution. Particularly, we can simulate delays that are normally distributed through the netem delay option.

To achieve that, we’ll specify two arguments that represent the mean and standard deviation of the distribution, respectively:

netem delay <mean> <standard deviation> distribution <distribution name>

If we specify the standard deviation value and omit the distribution argument, netem will use normal distribution by default.

For example, we could simulate normally distributed delays with the mean of 100ms and standard deviation of 50ms:

$ sudo tc qdisc add dev eth0 root netem delay 100ms 50ms distribution normal

To verify the new configuration, we could run our ping experiment again. One note about running statistical experiments is that we should collect more data per run. This is to ensure the obtained statistical reading is more accurate.

Let’s ping google.com 240 times and obtain the statistic for round trip time:

$ ping -c 240 -q google.com
PING google.com (216.58.196.14) 56(84) bytes of data.

--- google.com ping statistics ---
240 packets transmitted, 240 received, 0% packet loss, time 239363ms
rtt min/avg/max/mdev = 9.256/113.478/238.708/50.648 ms

From the summary, we can observe that the 240 ICMP requests take 113.48ms to complete on average. Factoring in the actual completion time of roughly 10ms, the result is per our expectations.

Additionally, the recorded standard deviation of 50.65ms is indeed close to the standard deviation values we’ve configured.

8. Simulating Packet Loss

With the loss option in netem, we could simulate network packet randomly dropping. For instance, we can simulate the scenario where the queue is dropping packets randomly with a 30% probability:

$ sudo tc qdisc add dev eth0 root netem loss 30%

Let’s run our ping experiment again to verify the behavior:

$ ping -q -c 60 google.com
PING google.com (172.217.27.238) 56(84) bytes of data.

--- google.com ping statistics ---
60 packets transmitted, 42 received, 30% packet loss, time 63590ms
rtt min/avg/max/mdev = 8.015/11.345/23.986/2.988 ms

From the output, we can see that the packet loss rate from our experiment is 30%. This is exactly as we’ve configured the interface.

In reality, packet losses are usually happening over a sequence of packets instead of by pure chance. To account for this correlation, we could specify a correlation percentage value as the last argument:

$ sudo tc qdisc add dev eth0 root netem loss 30% 50%

In the command above, we’re configuring the qdisc such that it drops 30% of the packets received. Additionally, 50% of the probability that the next packet is dropped will depend on the probability generated for the previous packet.

9. Simulating Packet Duplication

Using the duplicate option in netem, we can configure the qdisc to duplicate a packet randomly. For example, we can simulate a packet duplication with a 50% chance on eth0:

$ sudo tc qdisc add dev eth0 root duplicate 50%

To see it in action, we can ping google.com:

$ ping -c 2 google.com
PING google.com (142.250.199.46) 56(84) bytes of data.
64 bytes from 142.250.199.46 (142.250.199.46): icmp_seq=1 ttl=37 time=7.48 ms
64 bytes from 142.250.199.46 (142.250.199.46): icmp_seq=1 ttl=37 time=7.51 ms (DUP!)
64 bytes from 142.250.199.46 (142.250.199.46): icmp_seq=2 ttl=37 time=8.51 ms

--- google.com ping statistics ---
2 packets transmitted, 2 received, +1 duplicates, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 7.484/7.834/8.506/0.475 ms

As we can see from the output, there is a duplicate ICMP request for ICMP sequence 1.

10. Simulating Packet Corruption

Next, we could randomly inject data corruption to packets using netem. Concretely, we can do it using the corrupt option.

When the corrupt option is specified, the netem will randomly introduce a single-bit error on the packet according to the configured percentage.

For instance, we can introduce a 30% chance of packet corruption:

$ sudo tc qdisc add dev eth0 root netem corrupt 30%

Now, when we run the ping command, we should see an approximately 30% packet loss due to corrupted packets:

$ ping -q -c 240 google.com
PING google.com (172.217.174.174) 56(84) bytes of data.

--- google.com ping statistics ---
240 packets transmitted, 165 received, 31.25% packet loss, time 241364ms
rtt min/avg/max/mdev = 7.392/9.126/29.282/1.968 ms

11. Limiting the Transfer Rate

The netem qdisc supports the transfer rate limit through the limit option. For instance, we can limit the network transfer rate of interface eth0 to 10Mbit:

$ sudo tc qdisc add dev eth0 root netem rate 10Mbit

Let’s conduct an experiment to test the configuration out using the iperf command. The iperf command is a command in Linux mostly used for network load testing purposes.

Running iperf on the remote host at 172.18.0.3 shows that the capacity of bandwidth is 33.4Gbits:

$ iperf3 -c 172.18.0.3 -p 8080
Connecting to host 172.18.0.3, port 8080
[  5] local 172.18.0.2 port 50164 connected to 172.18.0.3 port 8080
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.36 GBytes  28.9 Gbits/sec    0   1.33 MBytes
[  5]   1.00-2.00   sec  3.86 GBytes  33.1 Gbits/sec    0   1.33 MBytes
[  5]   2.00-3.00   sec  3.94 GBytes  33.9 Gbits/sec    0   1.33 MBytes
[  5]   3.00-4.00   sec  3.97 GBytes  34.1 Gbits/sec    0   1.33 MBytes
[  5]   4.00-5.00   sec  3.92 GBytes  33.7 Gbits/sec    0   1.33 MBytes
[  5]   5.00-6.00   sec  4.01 GBytes  34.5 Gbits/sec    0   1.33 MBytes
[  5]   6.00-7.00   sec  4.02 GBytes  34.5 Gbits/sec    0   1.33 MBytes
^C[  5]   7.00-7.70   sec  2.82 GBytes  34.5 Gbits/sec    0   1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-7.70   sec  29.9 GBytes  33.4 Gbits/sec    0             sender
[  5]   0.00-7.70   sec  0.00 Bytes  0.00 bits/sec                  receiver

Once we activate the configuration, netem will limit the transfer rate on that interface to 10Mbits. As a result, running the same iperf command on the same host would give us a lower transfer rate:

$ iperf3 -c 172.18.0.3 -p 8080
Connecting to host 172.18.0.3, port 8080
[  5] local 172.18.0.2 port 50172 connected to 172.18.0.3 port 8080
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.29 MBytes  19.2 Mbits/sec    0    239 KBytes
[  5]   1.00-2.00   sec  1.06 MBytes  8.86 Mbits/sec    0    297 KBytes
[  5]   2.00-3.00   sec  1.30 MBytes  10.9 Mbits/sec    0    355 KBytes
[  5]   3.00-4.00   sec  1.55 MBytes  13.0 Mbits/sec    0    414 KBytes
[  5]   4.00-5.00   sec  1.80 MBytes  15.1 Mbits/sec    0    472 KBytes
[  5]   5.00-6.00   sec  1018 KBytes  8.34 Mbits/sec    0    530 KBytes
[  5]   6.00-7.00   sec  1.06 MBytes  8.86 Mbits/sec    0    588 KBytes
[  5]   7.00-8.00   sec  2.42 MBytes  20.3 Mbits/sec    0    648 KBytes
[  5]   8.00-9.00   sec  1.25 MBytes  10.5 Mbits/sec    0    706 KBytes
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0    764 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  13.7 MBytes  11.5 Mbits/sec    0             sender
[  5]   0.00-10.65  sec  12.1 MBytes  9.56 Mbits/sec                  receiver

12. Summary

In this tutorial, we’ve looked at the tc command in Linux.

We’ve started with some introductory text for network traffic control. Additionally, we’ve looked at the difference between classless and classful qdiscs.

Then, we’ve looked exclusively at the netem qdisc. Particularly, we’ve learned how to simulate several kinds of network failures with netem qdisc. For example, we’ve demonstrated simulating packet delays using the delay option. Additionally, we’ve seen how we can simulate normally distributed delays using distribution.

Furthermore, we’ve looked at the loss, duplicate, and corrupt options that were meant to inject faults into the packet queue.

Finally, we’ve learned to limit bandwidth using the limit option.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung