Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

Working with vectors is necessary in many fields, such as signal processing, machine learning, and graphics. Instead of using the scalar elements of a vector individually, computing using an entire vector in a single instruction boosts the performance of vector processing applications. The Advanced Vector Extensions (AVX) and its extension, AVX2, in Intel and AMD processors provide vector processing capability.

In this tutorial, we’ll discuss how to tell if a Linux machine supports AVX/AVX2 instructions.

2. What Are AVX/AVX2 Instructions?

AVX is a set of features and instructions added to the x86 instruction set architecture for Intel and AMD processors. This SIMD (Single Instruction Multiple Data) extension consists of instructions that allow us to perform vector processing with single instructions. For example, the AVX instruction, VMOVDQU, moves unaligned packed integer values from memory to SIMD registers.

AVX supports operations with vectors consisting of 256-bit integers. The first Intel microarchitecture supporting AVX was Sandy Bridge.

AVX2 is an expansion of AVX. AVX2 expands the number of operations with 256-bit integers and introduces new instructions. For example, the VPADDD AVX2 instruction adds two vectors of packed integers in SIMD registers.

There’s also a newer extension to AVX, namely AVX-512. It provides 512-bit vector processing capability besides new operations.

3. Using lscpu

One option to check the support for AVX/AVX2 instructions is the lscpu command. lscpu gives information about the CPU architecture:

$ lscpu 
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   2
  On-line CPU(s) list:    0-1
...
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
                          ...
                          pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx 
                          ...
                          se bmi1 avx2 bmi2 invpcid rdseed clflushopt md_clear f
                          ...
...

There are two CPUs on our host. lscpu shows the details of the first CPU only. The line beginning with Flags lists the features of the first CPU. For example, the first flag, fpu, shows the presence of a floating point unit (fpu).

As the list shows, this CPU supports AVX/AVX2 instructions since there are both avx and avx2 flags.

We can eliminate the other flags and show only the avx flags by filtering the output of lscpu using grep:

$ lscpu | grep -o "avx[^ ]*"
avx
avx2

Now, we have only the avx and avx2 flags in the output. grep -o prints only the matching parts of a line, each matching part on a separate line. We search for flags starting with avx, however, we exclude blanks between the flags using the “avx[^ ]*” regular expression.

4. Using /proc/cpuinfo

Alternatively, we can use /proc/cpuinfo. This virtual file identifies the processors within our system. Indeed, the lscpu command displays the information it gathers from /proc/cpuinfo and sysfs.

Let’s now print the content of /proc/cpuinfo using cat:

$ cat /proc/cpuinfo | grep -o "avx[^ ]*"
avx
avx2
avx
avx2

Since our host has two CPUs, the avx and avx2 flags were repeated twice in the output. Printing the content of /proc/cpuinfo displays the flags of all CPUs. We’ve filtered the output using grep as before.

Because of the presence of the avx and avx2 flags in the output, these two processors support AVX/AVX2 instructions.

5. Using AVX/AVX2 Instructions

We can also check the support for AVX/AVX2 instructions using compiler built-ins in a C program. If the processor supports AVX/AVX2 instructions, we should be able to run and execute an application using those built-ins.

5.1. Example Code

We’ll use the following C program, avx_example.c, that uses AVX/AVX2 instructions under the hood:

$ cat avx_example.c
#include <immintrin.h>
#include <stdio.h>

int main() {

    int int_array[8] = {-1, -2, -3, -4, -5, -6, -7, -8};
    __m256i vec= _mm256_loadu_si256((const __m256i *) int_array);
 
    __m256i addition_result = _mm256_add_epi32(vec, _mm256_abs_epi32(vec));
 
    _mm256_storeu_si256((__m256i *) int_array, addition_result);

    int* result = (int*)&addition_result;
    printf("%d %d %d %d %d %d %d %d\n",
      result[0], result[1], result[2], result[3],
      result[4], result[5], result[6], result[7]);

    return 0;
}

We’ll analyze the source code in the subsequent section.

5.2. Explanation of the Source Code

Let’s break down the code to understand it better:

#include <immintrin.h>

We start by including the immintrin.h header file since the built-ins of gcc for AVX/AVX2 are defined in this header file. Compiler built-ins, also named intrinsics, are like library functions, but they’re built in the compiler, not in a library. They use the corresponding AVX/AVX2 instructions internally.

Then, we define an integer array consisting of eight integers and convert it to a vector within the main() function:

int int_array[8] = {-1, -2, -3, -4, -5, -6, -7, -8};
__m256i vec = _mm256_loadu_si256((const __m256i *) int_array);

Then, we load the integer array to a vector of type __m256i using the _mm256_loadu_si256() built-in, which uses the VMOVDQU AVX instruction internally. The __m256i type consists of 256-bit integer data, that is, the eight 32-bit integers. The vector’s name is vec.

Then, we perform vector addition:

__m256i addition_result = _mm256_add_epi32(vec, _mm256_abs_epi32(vec));

The _mm256_add_epi32() built-in adds the two 32-bit vectors passed as parameters. The first argument is vec whereas the second argument is _mm256_abs_epi32(vec) in our example. The _mm256_abs_epi32(vec) built-in computes the absolute value of each of the 32-bit integers in vec and returns the result as a vector. Since the addition of a negative integer with its absolute value is zero, we expect the addition to be a vector consisting of eight zeros. We store the result in the addition_result variable. The _mm256_add_epi32() and _mm256_abs_epi32() built-ins use AVX2 instructions under the hood.

Then, we convert the vector, addition_result, back to the integer array:

_mm256_storeu_si256((__m256i *) int_array, addition_result);

The _mm256_storeu_si256() built-in is the inverse of _mm256_loadu_si256(). The result is stored back in the int_array integer array. _mm256_storeu_si256() uses an AVX instruction internally.

Finally, we print each element of the addition_result vector:

int* result = (int*)&addition_result;
printf("%d %d %d %d %d %d %d %d\n",
  result[0], result[1], result[2], result[3],
  result[4], result[5], result[6], result[7]);

5.3. Building and Running the Example

Let’s now compile avx_example.c using gcc:

$ gcc –mavx2 -o avx_example avx_example.c

The -o option of gcc specifies the executable’s name, avx_example. The -mavx2 option switches on the compiler’s AVX2 support besides the AVX support. We can use the -mavx option if we just use AVX instructions.

Having generated the avx_example executable, let’s run it:

$ ./avx_example 
0 0 0 0 0 0 0 0

The resulting vector consists of eight zeros, as expected. Compiling and running an application using AVX/AVX2 instructions successfully implies that the processor supports AVX/AVX2 instructions.

6. Conclusion

In this article, we discussed how to tell if a Linux machine supports AVX/AVX2 instructions.

First, we learned that AVX/AVX2 instructions are SIMD instructions added to the x86 instruction set in Intel and AMD processors to gain vector processing capability.

Then, we saw that the lscpu command and the /proc/cpuinfo virtual file provided information about the support of AVX/AVX2 instructions.

Finally, we learned that we can use the compiler built-ins, which use the AVX/AVX2 instructions internally, to check for AVX/AVX2 support. We compiled and ran an example C program using AVX/AVX2 instructions.