1. Overview

Computers use character encoding to map characters to binary numbers in order to store text data. Examples of character encodings include UTF-8, UTF-16, UTF-32, ASCII.

In this tutorial, we’ll learn how to find the encoding of a file in Linux.

2. Using file

One way to find the encoding of a file is using the file command:

$ file -bi text1.txt
text/plain; charset=us-ascii
$ file -bi text2.txt
text/plain; charset=utf-8

In the above snippet:

  • -b tells file to exclude the file name from the output; so the output is brief
  • -i tells file to include MIME-type information in the output; this information includes the media type and the character encoding of the file

Firstly, text1.txt is a plain text file with US-ASCII character-set encoding. Secondly, text2.txt is a plain text file with UTF-8 character-set encoding.

3. Using enca

Another way to find file encoding is to use enca. However, enca is not installed by default. So we need to install it first:

$ sudo apt update
$ sudo apt install enca

Now we can use enca:

$ enca -L none text1.txt 
7bit ASCII characters
$ enca -L none text2.txt
Universal transformation format 8 bits; UTF-8

In the snippet above, -L determines the language of the input file, which is in English. We should set it to none if it’s English.

Firstly, text1.txt uses 7-bit ASCII, aka US-ASCII, as character-set encoding. Secondly, text2.txt uses UTF-8 character-set encoding.

4. Conclusion

In this brief article, we discussed two methods to find the character encoding of a file in Linux.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.