In this article, we’ll go through the graphics stack used in Linux-based operating systems. We’ll see the different technologies that make graphical applications possible and how they interact with one another. We’ll start from the ground up and lead our way to the high-level GUI toolkits.
Finally, we’ll discuss how these technologies fit together to form a fully-fledged graphical experience.
2. Linux at Its Core
The name “Linux” merely refers to the Linux kernel. It’s not a complete operating system that contains everything out of the box, but rather a kernel around which everything is set up. The kernel is the interface between actual hardware and processes.
If we build and install a Linux kernel on a machine alongside helper tools and utilities, we can get very primitive graphics through Kernel Mode Setting in the virtual terminal but not complex graphics such as program windows, visual effects, and images with fancy gradients. To make Linux work with complex graphics, we’ll need a complete graphical stack including graphics drivers, graphics API wrappers, a window system, a compositor, and more.
So, most Linux-based operating systems like Ubuntu, Debian, and openSUSE have this graphical stack already packed into their distributions. Therefore, we have access to the graphical environment out of the box. However, if we were to use any other operating system like Arch, LFS, Gentoo, or Alpine, we’d need to configure the graphics stack manually in order to be able to have access to the graphical environment.
Therefore, in summary, Linux doesn’t have a native GUI or a standardized library built into it through which we can develop GUI programs. Nevertheless, we have access to a myriad of libraries, GUI toolkits, drivers, and packages through which we can have a graphical system.
3. Graphics API
A graphics API is responsible for translating a general set of instructions like drawing a triangle into a more specific code that the GPU can execute. Therefore, the graphics API is a description of how the developers’ code interfaces with the GPU.
There are several popular graphics APIs such as OpenGL, OpenGL ES, Metal, Direct3D, and Vulkan. However, most of the graphics stack on Linux makes heavy use of OpenGL because it’s free and cross-platform.
3.1. The OpenGL API
OpenGL stands for Open Graphics Library. It’s a set of specifications solely concerned with the hardware-accelerated rendering of 2D and 3D graphics, regardless of the platform. The native language for the API is C. However, there are also bindings for other languages like Java, Golang, and Rust.
To be precise, OpenGL is not exactly a library because each vendor has to implement the specification to produce an OpenGL library. Therefore, on Linux-based distributions, the libGL.so library file will be different for each vendor. In addition, there are multiple implementations of the OpenGL specification, including both third-party and open-source implementations.
Mesa is the open-source implementation of the OpenGL API and Vulkan. It uses card-specific drivers to translate the API into a hardware-specific form. In addition, Mesa supports the Gallium3D architecture for building 3D graphics drivers, which allows portability to all major operating systems.
GLES stands for OpenGL Embedded System. It’s an OpenGL profile that targets embedded devices such as Android phones and iPhones.
3.4. GLX and WGL
As we saw above, OpenGL is only concerned with drawing 2D and 3D graphics, and it has no concept for window management. For that reason, we need a way to bind an OpenGL scene with a window. GLX is an extension for the X Window System that provides the interface between an OpenGL scene and the X Window System.
Similar to GLX, WGL is the interface for OpenGL and the native window system of Microsoft Windows.
3.5. EGL and GLUT
EGL is a platform-independent API that provides an interface for OpenGL and the native window system of an operating system. It’s doesn’t depend on GLX or WGL, but instead, the vendors implement its specification.
Unlike EGL, GLUT is a wrapper around GLX and WGL, enabling us to write portable graphics applications.
3.6. fglrx and Catalyst
Catalyst is the AMD’s Xorg OpenGL driver, which went by the name fglrx. It’s a proprietary driver for the X.Org and has its own implementation of the OpenGL specification.
4. DRM and DRI
Both OpenGL and the window system implement the parts that are related to drawing objects on the screen. Therefore, they produce a set of card-specific instructions, which the Linux kernel handles through Direct Rendering Manager (DRM).
On Linux, we have libdrm, which makes it easy to access the DRM on the operating system. DRM uses a set of generic system ioctls to allocate memory for the graphical objects and stuffs the commands and texture it needs. The ioctl system is a special type of system call that deals with device-specific input and output operations. In this case, it deals with the input and output operations of a video card.
So, when we run a graphical application, it loads the OpenGL driver — for example, Mesa. The driver, in turn, loads libdrm, which enables talking directly to the kernel through ioctl.
So, this process goes on as long as the graphical application is running. However, we need a way to let the window system, such as the X Server, know what’s happening so it can synchronize and update itself. This synchronization process is known as Direct Rendering Infrastructure (DRI).
The graphical applications work great when we have a running X Server or a compositor. So, what about the graphics that run outside of the X Server, like the virtual terminal and the loading splash screen? This is where the Kernel Mode Settings subsystem comes in.
KMS is a subsystem in the Linux kernel and libdrm that enables us to directly configure the actual hardware through ioctls. For that reason, we don’t have to rely on the X Server. However, we should note that KMS is a very low-level subsystem and should only be used when a graphics server or a compositor cannot be run.
5. The X Window System
The X Window System is an open-source windowing system that is used by most Linux-based distributions. It’s based on the client-server architecture, which provides a network-transparent way to interact with windows that can also be used in remote environments.
Not only does it provide the fundamental framework for GUI environments, but it also carries out event handling and visual decorations.
Since the X Window System is based on a client-server architecture, the client and the server needn’t be on the same machine. For that reason, we need a protocol that carries the message between the client and the server. The X11 protocol is responsible for messages delivery. When the client and the server are on the same machine, the messages are exchanged through UNIX sockets.
Apart from that, X11 is extensible. So, it’s easy to add new features without creating a new protocol or breaking the existing clients. One of the most useful extensions is XRender, which adds support for anti-aliased drawings.
5.2. Xlib and XCB
XCB or X C-language Binding is also the client-side implementation of X. However, it’s on a much lower level than Xlib, and parts of the Xlib use XCB for some features.
5.3. X.Org Server
X.Org is the server-side implementation of the X Window System. It’s the most commonly used display server on Unix-like systems. The X.Org Server is typically started by a display manager or manually from the virtual terminal.
Cairo is a drawing library that deals only with vector graphics. It practically implements the same API as HTML5 <canvas>. In addition, it has support for drawing to X11 surfaces through the Xlib backend.
While we can use Cairo directly, we primarily use it in drawing toolkits like GTK+. It also has support for rendering through OpenGL.
The X server and Cairo each had their own implementation for pixel-level manipulation, which resulted in bloated code. To resolve this issue, Pixman was developed. Pixman is the shared library for X server and Cairo that provides rasterization algorithms, gradients support, and more.
A compositor is a program that provides an off-screen buffer for each window on the screen. This buffer is also known as Composite Overlay Window or COW, and it’s manipulated by the compositor. Thus, the compositor can apply additional stylings such as shadows, transparency, and gradients. Not only that, but it can also provide vertical synchronization and a tear-free experience.
Each frame from each running window goes through the compositor. The compositor grabs the pixmap of the windows from the X server and renders it onto the OpenGL scene.
While X is still functional and stable, it has quite a few problems. First, it is insecure by design due to its network transparent nature. Therefore, the payload is susceptible to harmful sniffing. Secondly, it’s a very old windowing system that relies heavily on the extensions, and parts of its functionality have been ported to the Linux kernel.
Wayland is the new intended replacement for X. Wayland doesn’t rely on the client-server architecture. So, instead of relying on a server, it acts as the window manager or a compositor for the graphical applications that handle events through evdev and display windows using the same stack we discussed. Wayland’s protocol is also based around UNIX sockets.
The Wayland client requests a buffer from the compositor and draws into it using OpenGL, Cairo, or any other rendering module. The compositor can easily manipulate the buffer for visual effects before handing it over to the client. So, in a sense, the compositor is the server and the compositor.
XWayland provides an X Server that runs under Wayland. Therefore, it’s a compatibility package for X applications during the transition to Wayland. However, it adds an extra layer between the X client and the X server since the messages are passed through the Wayland compositor.
9. GUI Toolkits
A GUI toolkit or GUI library contains the required functionality needed to create graphical interfaces and elements such as widgets, scenes, and event handlers. Some GUI toolkits are fully-featured frameworks that provide widgets, a graphical designer, and a development environment.
The GUI toolkit library is usually a wrapper around a low-level library such as Xlib or XCB. Therefore, it provides an easier way for us to develop graphical applications with additional styling and behavior.
Moreover, most mature GUI toolkits have an opinionated design. In other words, they implement their own markup languages, event systems, and state machines. The state machine is responsible for the management of complex programs that are reactive in nature.
GTK+ or GIMP Toolkit is the toolkit of choice for Unix-like operating systems that use the X Window System and Wayland. It’s a stable GUI toolkit for building cross-platform and modern GUI applications. Some of the most popular GUI programs are developed with GTK, including GIMP, Mozilla Firefox, GNOME Desktop Environment, Inkscape, and Pidgin.
The GTK toolkit has subsystems for other backends as well, such as GDI for Windows and Quartz for macOS, which means that we can also develop programs for other platforms. Moreover, there are independent projects that provide additional programs to ease the development of GTK programs. One such project is Glade. Glade provides a graphical interface to easily design the program front-end. Some projects like Firefox have their own customized fork of GTK.
Qt is a cross-platform application development framework that provides a widget library and a complete set of additional functionality. Unlike GTK, Qt has its own designer called QtDesigner and an integrated development environment called QtCreator.
Additionally, it defines its own implementation of networks, web sockets, multimedia, SQL, XML, and a web engine. We can easily port Qt programs to other platforms with little to no changes in the source code. Therefore, it’s the toolkit of choice for software that targets both embedded and desktop systems.
The Qt framework is thoroughly implemented in the C++ language. However, there are bindings for other languages as well. It defines its own markup language, QML, which is syntactically similar to CSS. For that reason, it gives us a great deal of power to customize the widgets to our liking.
10. The Role of OpenGL
As we saw, there’s no official native toolkit or a GUI library that is a silver bullet for developing a GUI application under Linux. We saw that each module in the graphics stack serves a unique purpose.
For the most part, in the graphics stack, we saw that all the graphics instructions pass through OpenGL. Therefore, it’s safe to say that we can develop a standalone application solely based on OpenGL. So, one can consider OpenGL to be kind of a native tool for developing graphical applications on Linux.
As an example, we can take a look at Blender, which is a cross-platform 3D modeling and animation tool. Blender doesn’t rely on other GUI toolkits and low-level client libraries, but it has its own widget library that is based entirely on OpenGL.
In this article, we discussed the graphics stack used on Unix-like operating systems. We started with the low-level components and worked our way to the high-level GUI toolkits that use the entire stack to render the graphics.
Finally, we briefly discussed the role of OpenGL as a native tool that is responsible for rendering graphical applications.