Disassembling Machine Code in Linux

1. Overview

All executable files contain machine code, which can be executed by the processor. But opening and reading a binary file by a human without conversion to another format makes no sense.

In this tutorial, we’ll check how we can read machine code in Linux.

2. The Problem

Let’s look at two problem scenarios. Let’s say we have machine code stored in a file or as a string.

Let’s see how we can disassemble it using different tools.

2.1. Reading From a File

We’ll create a binary file using a simple C program. We can then check how to convert the machine code in that binary to assembly language.

Let’s create a binary out of a C program:

$ cat test.c 
#include 
void main() {
    int i = 0;
    i += 20;
    return;
}
$ gcc test.c -o test
$ ls
test  test.c
$

As shown above, we have a C program that adds 20 to variable i. We then compiled the C program to produce a binary. If we compile using the -c flag, it outputs an object file with .o extension:

$ gcc -c test.c
$ ls
test  test.c  test.o
$

Now we are ready with a binary file and an object file.

2.2. Reading From a String

There are times we might want to analyze some random shellcode to see what it does.

Let’s look at some machine codes:

54: push esp
55: push ebp
90: nop

Now, let’s store this into a file that can be later read and disassembled:

$ echo -ne '\x54\x55\x90' > code
$ ls
code  test  test.c  test.o
$

With the above command, we echoed the shellcode string to a binary file named code.

Next, we’ll check how we can read these files.

3. Using the objdump Command

The objdump command is generally used to inspect the object files and binary files. It prints the different sections in object files, their virtual memory address, logical memory address, debug information, symbol table, and other pieces of information.

The general usage is:

objdump OPTIONS objfile ...

Here we’ll see how we can use this tool to disassemble the files.

3.1. Reading From a File

Using the -d option, we can see the assembly code for the binary:

$ objdump -d test

test:     file format elf64-x86-64

..
00000000000005fa <main>:
 5fa:	55                   	push   %rbp
 5fb:	48 89 e5             	mov    %rsp,%rbp
 5fe:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
 605:	83 45 fc 14          	addl   $0x14,-0x4(%rbp)
 609:	90                   	nop
 60a:	5d                   	pop    %rbp
 60b:	c3                   	retq   
 60c:	0f 1f 40 00          	nopl   0x0(%rax)

0000000000000610 <__libc_csu_init>:
..
$

A binary file contains a lot of sections in ELF format with address and metadata for properly loading the executable when it is launched. Since we have used the -d flag, it’ll print all the executable sections. Here we can see the relevant main section after stripping off others.

We see the add instruction to add 20 (0x14) to the variable i at the memory address 605.

In order to ensure this is the disassembly, we may modify the C program, compile it and run the objdump command on it again to see the changes.

Similarly, we can run the same command on the object file to disassemble the code:

$ objdump -d test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <main>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
   b:	83 45 fc 14          	addl   $0x14,-0x4(%rbp)
   f:	90                   	nop
  10:	5d                   	pop    %rbp
  11:	c3                   	retq   
$

As we can see above, unlike the binary file, the object file shows only the main section.

By default, it shows the disassembly in ATT mnemonic. If we need to change to Intel, then we can use the -M option:

$ objdump -d test.o -M intel

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <main>:
   0:	55                   	push   rbp
   1:	48 89 e5             	mov    rbp,rsp
   4:	c7 45 fc 00 00 00 00 	mov    DWORD PTR [rbp-0x4],0x0
   b:	83 45 fc 14          	add    DWORD PTR [rbp-0x4],0x14
   f:	90                   	nop
  10:	5d                   	pop    rbp
  11:	c3                   	ret    

$

2.3. Reading From a String

Once we saved the string to a file, we can use the command below to show the disassembly:

$ objdump -D -b binary -m i386 code

code:     file format binary

Disassembly of section .data:

00000000 <.data>:
   0:	54                   	push   %esp
   1:	55                   	push   %ebp
   2:	90                   	nop
$

As seen above, since this is a raw file, we need to give more information to the objdump command to disassemble it properly.

The options used in the above command are:

-D: disassemble all sections
-b: object code format, we say it is binary
-m: for which architecture the code is, we say it is i386

And from the result, we can see that the shellcode in the file is printed correctly in the output.

3. Using the gdb Command

If we need to debug something, gdb is the go-to tool. Using gdb, we can also disassemble code:

$ gdb test
(gdb) disassemble main
Dump of assembler code for function main:
   0x00000000000005fa <+0>:	push   %rbp
   0x00000000000005fb <+1>:	mov    %rsp,%rbp
   0x00000000000005fe <+4>:	movl   $0x0,-0x4(%rbp)
   0x0000000000000605 <+11>:	addl   $0x14,-0x4(%rbp)
   0x0000000000000609 <+15>:	nop
   0x000000000000060a <+16>:	pop    %rbp
   0x000000000000060b <+17>:	retq   
End of assembler dump.
(gdb) q
$

As shown above, we loaded the binary into gdb and executed the disassemble command on the main function to see the assembly code.

4. Using the ndisasm Command

The ndisasm utility comes along with the nasm package. It is mainly used to disassemble shellcode. It can disassemble binary files, but it doesn’t show the sections properly. So it would be very difficult to figure out the structure.

The typical usage is:

ndisasm [-b16 | -b32] filename

Let’s see an example of how to use it to disassemble a string of machine code that we earlier saved to a file:

$ ndisasm -b32 code 
00000000  54                push esp
00000001  55                push ebp
00000002  90                nop
$

As shown above, we’ve passed the processor mode as 32 bit, and it has generated the assembly code for that.

5. Conclusion

In this tutorial, we’ve seen how we can disassemble machine code from a file or from a string.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security