1. Overview

In this tutorial, we’ll look at some Linux tools for converting common text formats into PNG raster images or, if possible, SVG vector images.

We won’t consider other image formats because converting from one to another is quite simple.

2. Use Cases

Text-to-image conversion may be appropriate for different applications and scenarios. For example, it may be a requirement to embed a document or part of it into a web page. Or it may be the only way to send a document to a device that can read images but has very limited or no support for plain text, PDF, or DOCX formats, such as a smartphone.

Other uses may involve security issues, such as making manual changes to declassified official documents to hide portions of text that are not authorized for release. In fact, if we delete text from a PDF or even a Word file, someone can recover it under certain conditions, as some research shows. On the other hand, if we delete it from a raster image, there’s no chance of recovery.

We must be careful, however, that using images instead of copiable and pasteable text creates serious difficulties for people with visual impairments, as it completely prevents the use of speech synthesizers.

3. Plain Text

Plain text is a human-readable text format that doesn’t contain formatting or markup elements such as fonts, colors, images, XML tags, or encoding information. Characters may be unreadable if the encoding used to open the file is different from the encoding used to save it. UTF-8, or its ASCII subset, is widely compatible with almost all Linux applications.

For a generic text example, let’s use a Lorem Ipsum text generator:

$ pip install lorem-text
$ lorem_text > genericText.txt
$ file genericText.txt 
genericText.txt: ASCII text, with very long lines (566)
$ cat genericText.txt 
Rerum quibusdam error aperiam, tenetur cumque iusto perferendis ab nemo [...]

It’s important to note that this paragraph is on a single line of 566 characters, as indicated by file. In such cases, fmt comes to the rescue by automatically inserting line breaks:

$ fmt genericText.txt > genericText2.txt
$ cat genericText2.txt 
Rerum quibusdam error aperiam, tenetur cumque iusto perferendis ab
nemo quisquam autem, porro corrupti voluptatum sapiente dignissimos
[...]

To convert text to a raster or vector image, Pango‘s pango-view has several options for selecting font, colors, text alignment, size, language, margins, and more. Let’s try converting the text to a PNG image, keeping the defaults except for the dpi:

$ pango-view --dpi=120 -qo genericText2.png genericText2.txt

The result is as expected:

Pango's pango-view conversion of text to PNG imageConverting to SVG allows us to get a consistently well-defined image regardless of size:

$ pango-view -qo genericText2.svg genericText2.txt

Let’s check the quality of the resulting SVG by zooming in:

Pango's pango-view conversion of text to SVG imageHowever, in the case of ASCII art, source code, or terminal output, a monospaced font is required for proper display. All we have to do is add the –font=”mono” option to pango-view. Let’s try it with a random ASCII art generated by fortune and cowsay:

$ fortune | cowsay > asciiArt.txt
$ pango-view --dpi=120 --font="mono" -qo asciiArt.png asciiArt.txt

The resulting PNG is graphically correct:

Conversion of ASCII-art to imageIn other cases, we may want to choose a different font. If so, we can use fc-list to list all the available ones:

$ fc-list : family | sort
10.15 Saturday Night BRK
10.15 Saturday Night R BRK
18 Holes BRK
[...]

In this case, to use the first font, the option would be –font=”10.15 Saturday Night BRK”.

4. Formatted Text Document

In this section, we’ll look at common document formats that typically contain both text and images. In addition, we’ll use some freely available test files whenever possible.

4.1. LibreOffice-Compatible Documents

LibreOffice can open a considerable number of file formats, including those of the major office suites. It provides some terminal commands, including soffice, which can convert any supported document to PNG:

$ wget -O 'MSWord.doc' 'https://getsamplefiles.com/download/word/sample-1.doc'
[...]
$ soffice --convert-to png "MSWord.doc"

This way, we get the MSWord.png image of 1140×1475 pixels, which is an exact copy of the source document. However, we can’t choose the resolution at which to save the image, nor can we save it as a vector image to preserve the original quality.

A better approach is to convert the source document to PDF to preserve the quality and then convert the PDF to PNG or SVG.

If we use the PDF export from the LibreOffice GUI, we have many options. Otherwise, we can settle for the previous terminal command, which is useful for batch processing:

$ soffice --convert-to pdf "MSWord.doc"

Finally, before we move on to PDF conversion, let’s note that soffice and abiword have similar file conversion options, and both support wildcard characters in file names.

4.2. PDF Documents

PDF is a versatile file format that can present and share documents across software, hardware, and operating systems. Both GIMP and Inkscape support opening a PDF and selecting the page we want. Let’s start with a sample PDF:

$ wget -O 'example.pdf' 'https://getsamplefiles.com/download/pdf/sample-1.pdf'

GIMP is great for converting a PDF to a raster image at the desired resolution and then exporting it to PNG or another raster format:

GIMP importing a PDFInkscape, on the other hand, allows us to import a PDF as a vector image, preserving its quality:

inkscape imported PDFInkscape’s native format is SVG, so all we have to do is save the page in that format.

We can also do PDF to PNG or SVG conversions from a terminal using the inkscape command:

$ inkscape --pdf-page=1 --export-dpi=300 --export-filename=example.png example.pdf
Background RRGGBBAA: ffffff00
Area 0:0:816:1056 exported to 2550 x 3300 pixels (300 dpi)
$ inkscape --pdf-page=1 --export-filename=example.svg example.pdf
$ file example.png 
example.png: PNG image data, 2550 x 3300, 8-bit/color RGBA, non-interlaced
$ file example.svg
example.svg: SVG Scalable Vector Graphics image

The gimp command also allows us to convert files using the terminal, but its use is overly complex for such a simple task, as we can see in the GIMP Batch Mode documentation.

4.3. HTML Documents

HTML stands for HyperText Markup Language. It’s a standard document format that defines the structure and content of web pages. Converting it to a PNG image is made very convenient by Firefox’s Take Screenshot feature, which allows us to choose whether to save the entire page or just the visible part. We can also convert any web page or local HTML file to PNG from a terminal using Chrome’s headless mode:

$ google-chrome --headless --screenshot="html2image.png" https://www.informatica-libera.net/

Unfortunately, at the time of writing this tutorial, Firefox’s headless functionality for taking screenshots has stopped working.

Let’s remember that taking screenshots from the terminal requires special precautions. However, both Firefox and Chrome allow us to save HTML pages to PDF via the print function. In this case, we can convert the PDF to an image, as we saw earlier.

At this point, it’s obvious that any file we can convert to HTML or PDF can be converted to an image at a later date, as in the case of Markdown and ePub formats.

4.4. Markdown Documents

Markdown is a lightweight markup language used by many web platforms, such as GitHub, that allows us to create formatted documents. They have the extension .md, and we can easily convert them to HTML using pandoc:

$ wget -O 'Markdown.md' 'https://raw.githubusercontent.com/markdown-it/markdown-it/master/support/demo_template/sample.md'
$ pandoc -s --metadata title="Example of conversion from MD to HTML" -o Markdown.html Markdown.md

Comparing the original Markdown test document with the produced HTML file, we can see that most, but not all, of the formatting has been preserved. If this solution doesn’t satisfy us, we can try the same conversion with the free online tool markdown-it demo.

4.5. ePub Documents

ePub is a popular e-book format. We can easily convert it to HTML with the ebook-convert utility provided by Calibre:

$ wget -O 'eBook.epub' 'https://getsamplefiles.com/download/epub/sample-5.epub'
$ ebook-convert eBook.epub eBook.zip
[...]
InputFormatPlugin: EPUB Input running
[...]
Creating HTML Output...
[...]
ZIP output written to /home/francesco/eBook.zip
Output saved to   /home/francesco/eBook.zip
$ unzip eBook.zip
Archive:  eBook.zip
   creating: eBook_files/
  inflating: eBook_files/stylesheet.css  
  inflating: eBook_files/page_styles.css  
  [...]
  inflating: eBook.html

In this case, the last file listed in the previous output, eBook.html, contains the book’s table of contents.

4.6. Other Cases

Latex and Lyx are two popular tools in academia for creating scientific papers. They produce high-quality PDF files as their default output. Alternatively, they can also produce HTML. In both cases, we now know how to convert to image later.

It’s quite unlikely that a text document won’t fall into one of the cases we’ve seen in this tutorial or that it can’t be easily exported to PDF or HTML. Even in such a scenario, we can add PDF export to any program that can print using CUPS-PDF. Alternatively, there are many other ways to convert text to PDF.

As a last resort, there is always the option of taking a screenshot.

5. Conclusion

In this article, we’ve seen how to convert the following text formats into images:

  • plain text
  • LibreOffice-compatible format
  • PDF
  • HTML
  • Markdown
  • ePub
  • Latex and Lyx

In general, any document that can be exported to PDF or HTML can then be converted to an image.

Comments are closed on this article!