Profiling Processes in Linux | Baeldung on Linux

1. Introduction

In Linux systems, analyzing the behavior and performance of processes can be helpful in gaining a deeper understanding of running programs. For this purpose, we can profile processes to get periodic updates of performance metrics like memory or CPU usage.

Process profiling offers valuable insights into how applications perform, helping us to identify the performance bottlenecks and optimize resource utilization of our programs.

Profiling an application is an extensive topic. In this tutorial, we’ll explore a number of Linux tools and highlight their capabilities.

2. Summary of Processes With top

top provides real-time information on system activity and processes managed by the operating system.

We can configure top to get information on both system-wide overviews as well as details on a particular process. It presents information such as process IDs, thread counts, CPU, and memory use of processes.

2.1. Overview

If we want to get a general overview of processes and their details, we can use top directly:

$ top

top - 21:59:11 up 29 min,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 210 total,   1 running, 209 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.2 sy,  0.0 ni, 99.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  11843.1 total,   9358.9 free,   1114.4 used,   1369.8 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  10461.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1703 baeldung  20   0 3933300 350352 108468 S   1.3   2.9   0:07.01 gnome-shell
   1392 root      20   0 6730536 122060  76748 S   0.3   1.0   0:15.76 Xorg
   3879 baeldung  20   0   14724   4352   3556 R   0.3   0.0   0:00.10 top
      1 root      20   0  168640  11776   8248 S   0.0   0.1   0:04.77 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq
...

Let’s summarize some of the data columns above:

PID: process ID (PID)
PR: scheduling priority of a process
NI: the nice value of a task impacting its priority, where negative values mean higher priority, positive values mean lower, and zero indicates no priority adjustment
VIRT: used virtual memory
RES: resident memory size is a portion of the virtual address space that reflects the actual physical memory currently utilized by a task.
SHR: shared memory is a portion of resident memory that can be shared with other processes
S: process state

Other columns like %CPU or USER are more or less self-explanatory.

2.2. Particular Process Specifics

To narrow down our approach, we can utilize the -p option with top to retrieve details of only a specific process.

For example, let’s see how we can only get the gnome-shell process with PID 1703:

$ top -p 1703

top - 22:43:10 up  1:13,  2 users,  load average: 0.00, 0.02, 0.00
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  11843.1 total,   9356.8 free,   1115.4 used,   1370.8 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  10460.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1703 baeldung  20   0 3933300 350620 108528 S   0.7   2.9   0:09.15 gnome-shell

If we need to get a number of specific processes together, we can reuse the -p option, or we can list the PIDs separated by commas. The maximum number of processes we can filter with this option is 20.

On the other hand, we can use the batch mode of top using the -b or –batch option. This mode is handy for directing the output to other programs or saving it to a file. In this mode, top continues running either until the specified iteration limit set with the -n option is reached or until it’s manually terminated:

$ top -b -p 1703 -n 1
top - 22:53:17 up  1:23,  2 users,  load average: 0.00, 0.00, 0.00
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  11843.1 total,   9356.8 free,   1115.4 used,   1370.9 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  10460.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1703 baeldung  20   0 3933300 350620 108528 S   0.0   2.9   0:09.59 gnome-shell

As we can see above, the output is basically the same. top quits after printing the process-related information once as we specified with the -n option.

Another command-line tool we can utilize is ps. ps is an important Linux utility, offering valuable insights into the currently active processes.

Unlike top, which can provide continuous updates, ps is more specialized to gather information about particular processes on demand.

To begin with, we can use the -p option to make the tool work with a process we specify. Let’s see what the command outputs about the same process we used before:

$ ps -p 1703
   PID TTY          TIME CMD
  1703 ?        00:00:51 gnome-shell

The default output without additional options provides very limited information. We can use the -F option to increase the verbosity, but it won’t be useful either since our purpose is to get profiling-related data about a process.

However, for more targeted information, the -o option enables us to specify precisely what we require in terms of columns:

$ ps -o user,pid,thcount,priority,size,vsz,pcpu,pmem,cputime,etime,cmd -p 1703
USER         PID THCNT PRI  SIZE    VSZ %CPU %MEM     TIME     ELAPSED CMD
baeldung    1703     8  20 317756 3933300 0.0  2.9 00:00:51   21:02:42 /usr/bin/gnome-shell

As we can observe from the above output, we directly get the data we request. In particular, the advantage here is that we have the flexibility to print out any column we want.

To explore the available categories we can get, we can use the L option of ps:

$ ps L
%cpu         %CPU
%mem         %MEM
_left        LLLLLLLL
_left2       L2L2L2L2
_right       RRRRRRRR
_right2      R2R2R2R2
_unlimited   U
_unlimited2  U2
alarm        ALARM
args         COMMAND
atime        TIME
...

Let’s understand some of the data we can get:

PPID: parent process ID
THCNT: thread count
SIZE: approximate memory size
VSZ: virtual memory size

Of course, we can refer to the full documentation to get the information we’re after.

4. Diving Deep Into a Particular Process Using perf

The perf tool in Linux is a powerful performance profiling tool that facilitates detailed information gathering and analysis of system, process, and program performance data.

4.1. Install perf

First, we need to ensure that perf is installed on our system. We can typically install it through our package manager.

For example, on Debian-based systems, we can install perf using apt with sudo privileges:

$ sudo apt install linux-tools-common linux-tools-$(uname -r)

linux-tools packages contain the perf tool. The uname -r command returns the kernel version so that we can install the correct package. This is important for low-level tools.

4.2. Basic Profiling

Having installed perf on our system, we can start to profile processes using the stat subcommand of perf along with the -p option to specify a process:

$ sudo perf stat -p 1703 sleep 5

 Performance counter stats for process id '1703':

              1.14 msec task-clock                #    0.000 CPUs utilized
                 3      context-switches          #    2.631 K/sec
                 0      cpu-migrations            #    0.000 /sec
                 0      page-faults               #    0.000 /sec
            668248      cycles                    #    0.586 GHz
            275191      instructions              #    0.41  insn per cycle
             62127      branches                  #   54.478 M/sec
              3970      branch-misses             #    6.39% of all branches

       5.003091188 seconds time elapsed

From the output above, we can see that this tool provides lower-level profiling. Moreover, we leveraged the sleep subcommand to specify the sampling duration. Notably, we might need sudo privileges.

Now, let’s interpret some of the output that might seem unclear:

task-clock: total time the CPU was executing instructions
context-switches: number of context switches occurred, meaning the change of executed tasks managed by the operating system
cpu-migrations: occurs when a process is moved from one core to another
page-faults: number of accesses to a page that is not in the memory at the moment, and needs to be loaded from the disk

So, let’s see how we can preserve our findings across different sessions.

4.3. Save Profiling Data

We can use the record subcommand to capture the performance data for a particular process and save it into a file:

$ sudo perf record -g -p 1703 sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (104 samples) ]

Above, the -g option enables the call graph profiling. This feature can become especially valuable when profiling a program in which we need to get performance details at the function level.

4.4. Printing Saved Profiling Data

The profiling data is recorded into a file called perf.data by default. We can analyze the results with the report command:

$ sudo perf report

Samples: 104  of event 'cycles', Event count (approx.): 12925696
  Children      Self  Command      Shared Object                 Symbol
+   46.32%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] 0x00007fd99bd16cbd
+   46.32%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] 0x00007fd99bd16700
+   46.32%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] 0x00007fd99bd15c95
+   39.57%     0.00%  gnome-shell  [unknown]                     [k] 0000000000000000
+   33.49%     0.00%  gnome-shell  libgjs.so.0.0.0               [.] 0x00007fd99dd2bcf0
+   33.49%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] JS_CallFunction
+   22.25%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] 0x00007fd99c2bf67e
+   20.25%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] 0x00007fd99bd106d3
+   18.45%     0.37%  gnome-shell  [kernel.kallsyms]             [k] entry_SYSCALL_64_after_hwframe
+   18.08%     0.00%  gnome-shell  [kernel.kallsyms]             [k] do_syscall_64
...

As a result, we get an interface to profile our process on a deeper level.

Besides, we can view the results on the standard output with the –stdio option:

$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 104  of event 'cycles'
# Event count (approx.): 12925696
#
# Children      Self  Command      Shared Object                 Symbol
# ........  ........  ...........  ............................  ........................................
#
    46.32%     0.00%  gnome-shell  libmozjs-68.so.68.6.0         [.] 0x00007fd99bd16cbd
            |
            ---0x7fd99bd16cbd
               0x7fd99bd16700
               0x7fd99bd15c95
               |
               |--14.09%--0x7fd99bd088f0
               |          0x7fd99bd1651e
               |          0x7fd99bea25a2
               |          0x7fd99bd16cbd
               |          0x7fd99bd16700
               |          0x7fd99bd15c95
               |          |
               |          |--7.12%--0x7fd99bd0cffd
               |          |          0x7fd99c2bfb5f
               |          |          0x7fd99c2bf67e
               |          |          0x7fd99c223798
               |          |          0x7fd99c384356
               |          |          __mprotect
               |          |          entry_SYSCALL_64_after_hwframe
               |          |          do_syscall_64
               |          |          __x64_sys_mprotect
               |          |          do_mprotect_pkey
               |          |          mprotect_fixup
               |          |          vma_merge
               |          |          __vma_adjust
               |          |
...

Here, we can see the call graph and the object symbols. If we profile a program that is compiled with the -fno-omit-frame-pointer option, we can get more intuitive results in this call graph. However, delving into this topic is beyond the scope of this article.

5. Conclusion

In this article, we learned how we can profile processes in Linux using several tools such as top, ps, and perf. With these tools, we can gain deep insights into the processes effectively.

Equipped with this knowledge, we’re now well-prepared to explore more advanced profiling techniques and further enhance our capabilities.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security