How to Find File Encoding in Linux

1. Overview

Computers use character encoding to map characters to binary numbers in order to store text data. Examples of character encodings include UTF-8, UTF-16, UTF-32, ASCII.

In this tutorial, we’ll learn how to find the encoding of a file in Linux.

2. Using file

One way to find the encoding of a file is using the file command:

$ file -bi text1.txt
text/plain; charset=us-ascii
$ file -bi text2.txt
text/plain; charset=utf-8

In the above snippet:

-b tells file to exclude the file name from the output; so the output is brief
-i tells file to include MIME-type information in the output; this information includes the media type and the character encoding of the file

Firstly, text1.txt is a plain text file with US-ASCII character-set encoding. Secondly, text2.txt is a plain text file with UTF-8 character-set encoding.

3. Using enca

Another way to find file encoding is to use enca. However, enca is not installed by default. So we need to install it first:

$ sudo apt update
$ sudo apt install enca

Now we can use enca:

$ enca -L none text1.txt 
7bit ASCII characters
$ enca -L none text2.txt
Universal transformation format 8 bits; UTF-8

In the snippet above, -L determines the language of the input file, which is in English. We should set it to none if it’s English.

Firstly, text1.txt uses 7-bit ASCII, aka US-ASCII, as character-set encoding. Secondly, text2.txt uses UTF-8 character-set encoding.

4. Conclusion

In this brief article, we discussed two methods to find the character encoding of a file in Linux.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung

1. Overview

2. Using file

3. Using enca

4. Conclusion