1. Overview

A high temperature in a video card or GPU can lead to system instability and reduced performance. If we run our system for a long time with a high GPU temperature, it can cause permanent damage to the GPU.

We can periodically check this temperature and take appropriate actions when it reaches a critical level to ensure optimal performance of the GPU, especially when performing resource-intensive tasks. Some effective actions to fix the problem include cleaning the GPU, decreasing its clock speed, increasing airflow, installing external cooling fans, and replacing the thermal paste.

In this tutorial, we’ll discuss two methods to check the temperature of the video card in Debian-based Linux systems.

2. Checking Video Card Details

Before checking the temperature of the video card, let’s explore the details of the video card installed in the system. Knowing the video card details helps troubleshoot graphical issues, optimize performance, and manage resources.

We can view the details of the video card using the lspci tool. It’s a pre-installed tool in most Linux distributions. However, if we encounter errors while launching the lspci tool, we can first update the pre-installed package list. Furthermore, we can install the pciutils package that contains the lspci tool:

$ sudo apt update
$ sudo apt install pciutils

Now, let’s run the lspci tool to view the video card details in the system:

$ lspci
00:00.0 Host bridge: Intel Corporation Coffee Lake HOST and DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620]
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0b)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
...output truncated...

The lspci tool displays all the PCI devices on the system, and we can look for an output line with VGA to know about the video card details. Alternatively, we can use the grep command that searches and displays the video card details from the whole output:

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620]

The output provides us with the PCI address, type, manufacturer, and marketing name of the VGA controller. Additionally, it shows that the system utilizes integrated Intel graphics (UHD). Furthermore, if we want to explore more details, we can use the lshw command:

$ sudo lshw -C display
  *-display                 
       description: VGA compatible controller
       product: WhiskeyLake-U GT2 [UHD Graphics 620]
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       logical name: /dev/fb0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom fb
       configuration: depth=32 driver=i915 latency=0 resolution=1920,1080
       resources: irq:137 memory:a0000000-a0ffffff memory:80000000-9fffffff ioport:5000(size=64) memory:c0000-dffff

Now we know the details about the video card installed in the system and can proceed to check the temperature.

3. Using the sensors Command

We can use the sensors command, which comes as part of the lm-sensors package, to check the temperature of the video card. The sensors command also provides information on the temperature of the CPU, motherboard, and storage devices. Furthermore, regular GPU temperature monitoring can help us to detect potential overheating issues and system errors.

To use the sensors command, first, we need to install the lm-sensors package:

$ sudo apt install lm-sensors

After the successful installation of the sensors command, we need to detect the sensors installed in the system to proceed further:

$ sudo sensors-detect

At this point, we’ve detected all the sensors. Now, we can display the temperature data of all the sensors:

$ sensors
pch_cannonlake-virtual-0
Adapter: Virtual device
temp1:        +36.0°C  

BAT1-acpi-0
Adapter: ACPI interface
in0:          12.21 V  
curr1:            N/A  

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +37.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +37.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +35.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +36.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +37.0°C  (high = +100.0°C, crit = +100.0°C)

nvme-pci-0300
Adapter: PCI adapter
Composite:    +37.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +37.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)
temp3:        +10.0°C  

The output displays temperature data from virtual sensors, battery, CPU, SSD, and ACPI sensors. To find the temperature of the video card, we need to look for the ISA adapter section. However, we can’t see dedicated temperature data for the video card as the system uses integrated Intel graphics. Here, Package id 0 denotes the temperature of the CPU package containing the integrated video card, which is well below the critical level.

If the system uses a discrete NVIDIA GPU, we should see GPU-related information in the output:

nouveau-pci-0100
Adapter: PCI adapter
GPU core: +56.0°C

Furthermore, if multiple GPUs are installed in the system, we should see a list of all the GPUs along with their temperature data using the sensors command:

amdgpu-pci-0300
Adapter: PCI adapter
temp1: +43.0°C

nouveau-pci-0100
Adapter: PCI adapter
GPU core: +64.0°C

nouveau-pci-0200
Adapter: PCI adapter
GPU core: +51.0°C

In this case, one AMD GPU and two NVIDIA GPUs are installed in the system.

4. Using NVIDIA System Management Interface

If an NVIDIA video card is installed in a system, we can use the NVIDIA System Management Interface to view the video card temperature data. This tool is used to manage and monitor NVIDIA GPUs. Additionally, it provides access to GPU-related processes and power policies.

To install it in the system, we first install NVIDIA GPU drivers:

$ sudo apt-get update
$ sudo apt-get install nvidia-driver

Now, we can run the NVIDIA System Management Interface tool:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:0A:00.0 Off |                  N/A |
| 70%   68C    P2   321W / 320W |  10038MiB / 10240MiB |     76%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

It provides much information regarding the GPU, including the driver version, GPU name, temperature, utilization, power usage, and fan speed. Additionally, if we only want to display GPU temperature data, we can use the -query-gpu option:

$ nvidia-smi --query-gpu=temperature.gpu
temperature.gpu [C]
68

Here, we can see the current temperature of the GPU is 68 degrees Celsius.

In case of multiple NVIDIA GPUs installed in the system, we would see information for each GPU separately:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================|======================|======================|
|   0  GeForce RTX 3080    Off  | 00000000:0A:00.0 Off |                    0 |
| 70%   68C    P2   321W / 320W |  10038MiB / 10240MiB |     76%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:1F.0 Off |                    0 |
| N/A   37C    P0    52W / 300W |  11647MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:00:20.0 Off |                    0 |
| N/A   32C    P0    50W / 300W |  11590MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

In this example, there are three NVIDIA GPUs installed in the system.

Again, the nvidia-smi tool gives us the option to display only the temperature data of the GPUs:

$ nvidia-smi --query-gpu=temperature.gpu 
temperature.gpu [C] 
68 
37 
32

Additionally, we can also check the temperature of a particular GPU by adding the GPU ID with the command:

$ nvidia-smi --query-gpu=temperature.gpu --id=2 
temperature.gpu [C] 
32

The GPU index starts from 0. Hence, if we want to check the temperature of the third GPU, the associated index would be 2.

5. Conclusion

In this article, we discussed two methods to check the temperature of the video card. The first method supports all types of GPUs, including AMD, NVIDIA, and integrated GPUs. On the other hand, the second method is better suited to a system with an NVIDIA GPU.

Comments are closed on this article!