In this tutorial, we’ll be exploring how so (shared-object) files are organized in a Linux filesystem.
To better understand the context of this article, we’ll need an example. As this may vary between operating systems, let’s find some library to work with using the ldconfig tool:
$ ldconfig -p .... libGLX.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libGLX.so.0 libGLU.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libGLU.so.1 libGLESv2.so.2 (libc6,x86-64) => /lib/x86_64-linux-gnu/libGLESv2.so.2 libGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libGL.so.1 libFLAC.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libFLAC.so.8 libEGL_mesa.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_mesa.so.0 libEGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so.1 libBrokenLocale.so.1 (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libBrokenLocale.so.1 libBrokenLocale.so (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libBrokenLocale.so libBLTlite.2.5.so.8.6 (libc6,x86-64) => /lib/libBLTlite.2.5.so.8.6 libBLT.2.5.so.8.6 (libc6,x86-64) => /lib/libBLT.2.5.so.8.6 ld-linux-x86-64.so.2 (libc6,x86-64) => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
With ldconfig, we can see a printed list of all libraries installed on this system.
Let’s go for the zip library:
$ ldconfig -p | grep "libzip.so*" libzip.so.5 (libc6,x86-64) => /lib/x86_64-linux-gnu/libzip.so.5 libzip.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libzip.so
On this system, the zip library lives at /lib/x86_64-linux-gnu/libzip.so*.
Please note that if the zip library doesn’t exist, we would have to pick another libname.so file from the list.
$ ls -l /lib/x86_64-linux-gnu/libzip.so* lrwxrwxrwx 1 root root 11 Nov 27 2018 /lib/x86_64-linux-gnu/libzip.so -> libzip.so.5 lrwxrwxrwx 1 root root 13 Nov 27 2018 /lib/x86_64-linux-gnu/libzip.so.5 -> libzip.so.5.0 -rw-r--r-- 1 root root 105672 Nov 27 2018 /lib/x86_64-linux-gnu/libzip.so.5.0
So, using the libzip as an example, let’s discuss the three different files and their numbers, such as .so.5 and .so.5.0.
3. SO File Organization on Linux
We can see that there three files in the example we’ve chosen:
/lib/x86_64-linux-gnu/libzip.so /lib/x86_64-linux-gnu/libzip.so.5 /lib/x86_64-linux-gnu/libzip.so.5.0
Each of these files has a special name and use:
libzip.so # is called the linker-name used for linking. libzip.so.5 # is called the soname used by the operating system loader. libzip.so.5.0 # is called the real-name which is updated by the library maintainer.
where “zip” is the name of the library.
Each file has a different number at the end of it. These numbers represent versions of the library. They’re very important as it’s these versions that determine the role of each file.
They’re two popular versioning schemes. One is semantic versioning. The other variant is the Libtool versioning scheme.
Let’s start by discussing these version schemes in more detail and follow it up with the naming convention used. We’ll then discuss in detail the role of each file.
3.1. Linux Versioning
Each library in Linux has its version information written in the filename. For example, the lbizip.so.5.0 has a generic form of libzip.so.X.Y.Z. This is also known as semantic versioning, a convention that every library on a Linux system follows.
X represents the major version. When the library developer makes a non-backward compatible change, they’ll have to increase the major version.
Y represents the minor version. A backward compatible change increments the minor version. For example, adding an isolated feature like a new function. Also note, if a Z number is not present then this may also represent bug fixes.
Z is a patch number. This is optional. A bug fix will increment the patch version.
One more important aspect is the release version.
Let’s take a quick look at the installed llvm libraries:
$ ldconfig -p | grep LLVM libLLVM-9.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libLLVM-9.so.1 libLLVM-9.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libLLVM-9.so libLLVM-8.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libLLVM-8.so.1 libLLVM-8.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libLLVM-8.so
Notice how there are two releases here, libLLVM-9 and libLLVM-8 for the same library, llvm. A divergence in release 8 meant the library was no longer backward compatible. Hence, the library developers decided to create a separate release, libLLVM-9.
3.2. Libtool Versioning
It’s important not to confuse the libtool version scheme with semantic versioning. The goal of the libtool versioning scheme was to standardize versions across multiple platforms.
A developer using libtool could use the libtool version scheme; however, this will then be translated as best as possible by libtool to semantic versioning for a Linux system.
In contrast to semantic versioning. The libtool version scheme has current, revision, and age for the X.Y.Z number fields. This is instead of major, minor, and patch. The libtool version scheme also comes with a set of rules on when to increment version values!
3.3. Naming Convention
Looking at the various libraries listed by ldconfig, we can notice a convention.
The ldconfig utility and others expect so (shared object) files to follow the naming pattern of lib*.so* or ld-*.so* where the latter is reserved for the dynamic linker only. This convention is hard-coded in many Linux tools. This is why all the libraries listed by ldconfig start with lib.
An example of a tool that expects this convention is GCC. When a developer attempts to compile code using GCC, the library name is passed with the -l flag:
gcc a.c -l zip
The linker takes the string “lib” and appends “zip” (the library name). This results in a file called libzip.so which the linker attempts to find.
The conclusion is that tools in the Linux environment follow the naming convention of libname.so.
4. SO Files
Let’s look at each of the linker files in detail. First, let’s go through the real-name.
The real-name is the actual filename of the physical library on disk. An important point though is if we augment the libzip library as follows:
# Linker-name /lib/x86_64-linux-gnu/libzip.so # sonames /lib/x86_64-linux-gnu/libzip.so.4 /lib/x86_64-linux-gnu/libzip.so.5 # real-names /lib/x86_64-linux-gnu/libzip.so.4.1 /lib/x86_64-linux-gnu/libzip.so.4.2 /lib/x86_64-linux-gnu/libzip.so.5.0
We can have multiple physical libraries on a Linux system. They’re all living happily in the same folder. By creating the soname files, we can use the appropriate library for executing applications. A developer can even choose which library to link against by changing the linker-name.
Let’s have a look at the rest of the concepts in detail.
libzip.so is a symlink which does not have a number associated with it. It’s known as the linker-name.
It’s named that due to the search mechanism of the linker. The search mechanism takes in a key which is the library name and returns a file with a pattern of libname.so.
By being a symlink and not having a version, libzip.so can be used to point to any libzip version:
$ ls -lah /lib/x86_64-linux-gnu/libzip.so lrwxrwxrwx 1 root root 15 Mar 9 12:45 /lib/x86_64-linux-gnu/libzip.so -> libzip.so.5
In this particular example, libzip.so points to the latest major libzip version, libzip.so.5.
By default, the symlink is set up by an installation script to point to the version of the library they were installing. If we wanted to change which library to link against, we could change the symlink.
The linker-name symlink as a concept provides a consistent interface that the linker can depend on. It also gives us the freedom to tell the linker which library file to search and link against.
Let’s now look in more detail at the soname. libzip.so.5 is generated automatically by running ldconfig.
Let’s see how this works:
First, let’s remove libzip.so.5 as follows:
$ sudo rm /usr/lib/x86_64-linux-gnu/libzip.so.5 $ ls -l /usr/lib/x86_64-linux-gnu/libzip* lrwxrwxrwx 1 root root 33 May 21 16:21 /usr/lib/x86_64-linux-gnu/libzip.so -> /lib/x86_64-linux-gnu/libzip.so.5 -rw-r--r-- 1 root root 105672 Nov 27 2018 /usr/lib/x86_64-linux-gnu/libzip.so.5.0
So we’ve successfully removed the symlink of libzip.so.5.
Now, let’s run ldconfig:
$ sudo ldconfig $ ls -l /usr/lib/x86_64-linux-gnu/libzip* lrwxrwxrwx 1 root root 33 May 21 16:21 /usr/lib/x86_64-linux-gnu/libzip.so -> /lib/x86_64-linux-gnu/libzip.so.5 lrwxrwxrwx 1 root root 13 May 21 16:26 /usr/lib/x86_64-linux-gnu/libzip.so.5 -> libzip.so.5.0 -rw-r--r-- 1 root root 105672 Nov 27 2018 /usr/lib/x86_64-linux-gnu/libzip.so.5.0
Aha! libzip.so.5 is back. It’s pointing to the actual library libzip.so.5.0.
An important point here is, how does ldconfig know what symlink to create?
Let’s look at the libzip.so.5.0 file in more detail:
$ readelf -d /usr/lib/x86_64-linux-gnu/libzip.so.5.0 Dynamic section at offset 0x18db0 contains 28 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libz.so.1] 0x0000000000000001 (NEEDED) Shared library: [libbz2.so.1.0] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.1.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000e (SONAME) Library soname: [libzip.so.5] ....
We can see that the soname is written into the library’s metadata. It’s set to libzip.so.5 by the developer. The symlink libzip.so.5 is a library group which, represents the major version 5 of the libzip library.
Therefore the symbolic link is a reflection of the metadata that is actually in the physical library libzip.so.5.0.
To better understand how this gets used, let’s take a look at ls:
$ ldd `which ls` linux-vdso.so.1 (0x00007ffd0319c000) libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f02b281d000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f02b262b000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f02b259b000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f02b2595000) /lib64/ld-linux-x86-64.so.2 (0x00007f02b2883000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f02b2572000)
As we can see, ldd lists the runtimes dependencies of ls. The first column contains the sonames of the shared objects. The second column is the path to the shared object along with the address at which it’s loaded.
Let’s take libpthread.so.0 as an example and try to think how this entry was added:
$ ls -l /usr/lib/x86_64-linux-gnu/libpthread* -rwxr-xr-x 1 root root 157224 Apr 14 20:26 /usr/lib/x86_64-linux-gnu/libpthread-2.31.so -rw-r--r-- 1 root root 6587378 Apr 14 20:26 /usr/lib/x86_64-linux-gnu/libpthread.a lrwxrwxrwx 1 root root 37 Apr 14 20:26 /usr/lib/x86_64-linux-gnu/libpthread.so -> /lib/x86_64-linux-gnu/libpthread.so.0 lrwxrwxrwx 1 root root 18 Apr 14 20:26 /usr/lib/x86_64-linux-gnu/libpthread.so.0 -> libpthread-2.31.so
When compiling ls, the linker followed the linker-name symlink of libpthread.so. This resulted in looking at the metadata of libpthread-2.31.so where the linker extracted the soname metadata from the real library libpthread-2.31.so.
$ readelf -d libpthread-2.31.so | grep soname 0x000000000000000e (SONAME) Library soname: [libpthread.so.0]
This metadata contains the soname libpthread.so.0 which was added to the list of runtime dependencies.
This shows that the primary purpose of an soname is to be used for executing applications. The dynamic linker uses the soname to load these files into memory. This way the developer can update libraries in the background without breaking any runtime dependencies.
We started this tutorial by looking at the file naming conventions of a library in Linux.
We then went into detail explaining the three different symlinks involved with a library. This included the linker-name, the soname, and the real-name.