Extracting the Raw Contents of an ELF Section

1. Overview

ELF (Executable and Linkable Format) is a standard file format used for executables, shared libraries, object files, and core dumps. It’s the standard binary file format in Linux.

In this tutorial, we’ll discuss extracting the raw contents of an ELF section, specifically the .text section. Although we extract the .text section in our examples, the methods are also applicable to other sections of an ELF file.

2. Example Setup

We’ll use the following C program, main.c, in our examples:

#include <stdio.h>

void main()
{
    printf("Hello World\n");
}

The program prints just Hello World and exits. Let’s build it using gcc:

$ gcc -o main main.c

We’ll extract the .text section of main using different methods.

3. Using objdump

The objdump command displays information about object files and binaries. Its -h option displays section headers.

Let’s use objdump to analyze the executable main:

$ objdump -h main | grep .text
 13 .text         000000f7  0000000000401040  0000000000401040  00001040  2**4

We filter the output of objdump -h main using grep .text to print only the information about the .text section. The important columns for us in the output are the third and sixth columns. The hexadecimal value in the third column, 000000f7, is the size of the .text section, while the hexadecimal value in the sixth column, 00001040, is the offset of the .text section in the file.

Having obtained the offset and the size of the .text section, we can extract it using the dd command:

$ dd if=main of=out_objdump bs=1 count=$((0xf7)) skip=$((0x1040))
247+0 records in
247+0 records out
247 bytes copied, 0.000308316 s, 801 kB/s

The if and of options of dd specify the input and output files, which are main and out_objdump in our case. The bs option corresponds to the block size. We copy the data in one-byte chunks due to bs=1.

The count=$((0xf7)) option specifies the number of input blocks to read, i.e., the size of the .text section. count accepts only decimal values. So, we convert the hexadecimal value 0xf7 to decimal using $((0xf7)), which is 247.

The skip=$((0x1040)) option specifies reading the input blocks starting from the offset 0x1040, which is the starting address of the .text section in our example. skip also accepts only decimal values.

To create the dd statement automatically, we use awk:

$ objdump -h main | grep .text | awk '{print "dd if=main of=out_objdump bs=1 count=$((0x"$3")) skip=$((0x"$6"))"}'
dd if=main of=out_objdump bs=1 count=$((0xf7)) skip=$((0x1040))

We use the print statement of awk to create the dd statement. We pass the third column of the output of objdump -h main | grep .text to count using count=$((0x”$3″)). Similarly, we pass the sixth column to skip using skip=$((0x”$6”)).

Finally, we run the generated dd command by feeding it to bash:

$ objdump -h main | grep .text | awk '{print "dd if=main of=out_objdump bs=1 count=$((0x"$3")) skip=$((0x"$6"))"}' | bash
247+0 records in
247+0 records out
247 bytes copied, 0.000256086 s, 965 kB/s
$ ls -l out_objdump
rw-rw-r-- 1 centos centos 247 Jul  1 08:50 out_objdump

Therefore, we successfully extracted the .text section of main.

4. Using objcopy

Another option to extract the .text section of an ELF file is the objcopy utility. objcopy copies the contents of an object file to another file. We can use the –dump-section option for extracting and writing a section to a file:

$ objcopy --dump-section .text=out_objcopy_dumpsection main

The –dump-section .text=out_objcopy_dumpsection part of the command specifies to extract the .text section and write it to the output file named out_objcopy_dumpsection. The general format of this option is –dump-section sectionname=filename. The input object file, whose .text section is extracted, is specified as main at the end of the command.

Let’s check whether the output file, out_objcopy_dumpsection, exists:

$ ls -l out_objcopy_dumpsection
rw-rw-r-- 1 centos centos 247 Jul  1 08:51 out_objcopy_dumpsection

We successfully extracted the .text section of main using the –dump-section option.

We can also use the –only-section option of objcopy for the same goal:

$ objcopy -O binary --only-section=.text main out_objcopy_onlysection

The -O option of objcopy is for specifying the format of the output. The -O binary part of the command essentially produces a raw binary file. The –only-section=.text part copies only the named section, which is .text in our example, from the input file to the output file. The input and output files are main and out_objcopy_onlysection, respectively.

Let’s check whether the output file, out_objcopy_onlysection, exists:

$ ls -l out_objcopy_onlysection
rwxrwxr-x 1 centos centos 247 Jul  1 08:52 out_objcopy_onlysection

Therefore, we successfully extracted the .text section of main using the –only-section option.

5. Using the libelf Library

Another alternative is the libelf library. It provides an API to read, modify and create ELF files. We must include the libelf.h header file to use this API.

We’ll use the following C program, elf_analyzer.c, to extract the .text section of main:

#include <stdio.h>
#include <libelf.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>

void main() {
    Elf *elf = NULL;
    Elf_Scn *scn = NULL;
    Elf_Data *data = NULL;
    Elf64_Shdr *shdr = NULL;
    char *name = NULL;
    int fd_in = 0, fd_out = 0;
    size_t shstrndx = 0, n = 0;

    // Necessary to call other libelf library functions
    elf_version(EV_CURRENT);

    // Open main
    fd_in = open("main", O_RDONLY, 0);

    // Convert the file descriptor to an ELF handle
    elf = elf_begin(fd_in, ELF_C_READ, NULL);

    // Retrieve the index of the section name string table
    elf_getshdrstrndx(elf, &shstrndx);

    // Iterate through the sections
    while ((scn = elf_nextscn(elf, scn)) != NULL) {
        // Retrieve the section header
        shdr = elf64_getshdr(scn);

        // Get the name of the section
        name = elf_strptr(elf, shstrndx, shdr->sh_name);

        // Check if the section is .text
        if (strcmp(name, ".text") == 0) {
            // Open the binary output file
            fd_out = open("out_libelf", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);

            // Read the section and write it to the file
            while ((data = elf_getdata(scn, data)) != NULL) {
                write(fd_out, data->d_buf, data->d_size);
            }

            // Close the file descriptor of output file
            close(fd_out);
        }
    }
    // Release the ELF descriptor
    elf_end(elf);
    // Close main
    close(fd_in);   
}

The code doesn’t handle error situations to be concise.

5.1. Explanation of the Source Code

We’ll break down the code to understand it:

// Necessary to call other libelf library functions
elf_version(EV_CURRENT);

In the code snippet above, we query the current version of the libelf library. It’s mandatory to call this function before calling other functions in the libelf library.

Then, we obtain an ELF descriptor for main:

// Open main
fd_in = open("main", O_RDONLY, 0);

// Convert the file descriptor to an ELF handle
elf = elf_begin(fd_in, ELF_C_READ, NULL);

In the code snippet above, we open the executable main using the open() function. Then, we obtain an ELF handle, elf, using the file descriptor returned by open() by calling the elf_begin() function. ELF_C_READ means that we want to examine the ELF file.

Then, we get the index of the string table section:

// Retrieve the index of the section name string table
elf_getshdrstrndx(elf, &shstrndx);

String table sections hold the symbol and section names. We use the elf_getshdrstrndx() function to get the index. The index is stored in the shstrndx variable.

Then, we iterate through the sections until we find the .text section:

while ((scn = elf_nextscn(elf, scn)) != NULL)

In the code snippet above, we use the elf_nextscn() function to loop over the sections. This function returns NULL when we finish looping over all sections.

Within the while loop, we first get the name of each section:

// Retrieve the section header
shdr = elf64_getshdr(scn);

// Get the name of the section
name = elf_strptr(elf, shstrndx, shdr->sh_name);

In the code snippet above, we get the section header using the elf64_getshdr() function. Then, we find the name of the section using the elf_strptr() function.

If we encounter the .text section, we write it to a file:

// Check if the section is .text
if (strcmp(name, ".text") == 0) {
    // Open the binary output file
    fd_out = open("out_libelf", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);

    // Read the section and write it to the file
    while ((data = elf_getdata(scn, data)) != NULL) {
        write(fd_out, data->d_buf, data->d_size);
    }

    // Close the file descriptor of output file
    close(fd_out);
}

In the code snippet above, strcmp() compares the name of the current section with “.text”. If they match, we get the contents of .text using the elf_getdata() function and write it to the binary file out_libelf using the write() function. We close the output file out_libelf using close() at the end.

Finally, we do the necessary cleanup:

// Release the ELF descriptor
elf_end(elf);
// Close main
close(fd_in);

In this code snippet, elf_end() releases the resources of the ELF descriptor. Then, we close the input file main using close().

5.2. Building and Running the Executable

We’ll build the C code, elf_analyzer.c, using gcc:

$ gcc -o elf_analyzer elf_analyzer.c -lelf

The output of the executable file is elf_analyzer, which we specify using the -o option of gcc. We must link the executable with the libelf library using -lelf.

Let’s run elf_analyzer and check the existence of the output file, out_libelf:

$ elf_analyzer
$ ls -l out_libelf
rw------- 1 centos centos 247 Jul  1 08:55 out_libelf

out_libelf exists as expected.

6. Comparison of the Results

We’ve extracted the .text section of main in four different ways up to this point, but we haven’t compared the output files. Let’s compare them using diff:

$ diff out_objdump out_objcopy_dumpsection
$ diff out_objdump out_objcopy_onlysection
$ diff out_objdump out_libelf

All of them are the same according to the output of diff.

7. Conclusion

In this article, we discussed extracting the raw contents of an ELF section, namely .text.

Firstly, we used objdump to extract the .text section of the built executable main. Secondly, we learned that objcopy is another alternative. We used its –dump-section and –only-section options. Thirdly, we saw that we could use the API provided by the libelf library for extracting the .text section.

Finally, we compared the result of each method and saw that they were the same.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security