1. Introduction

Pure text files have been the and remain a backbone of computing. They contain only characters with a given encoding, like ASCII, UTF-8 (Unicode), and others, without any binary formatting. Still, as hardware and software evolved, the types of files have, too. So, the Portable Document Format (PDF) filled the gap between text files and heavily-specialized graphics formats to provide a fairly universal and accepted structure for data transfer.

In this tutorial, we’ll explore ways to convert simple text files to PDF from the command line. First, we create some sample data and demonstrate our method for simple verification. After that, we continue with the usage of office suites for our purpose. Then, we show and discuss different tools and utilities to convert a text file to PDF, outlining some features and drawbacks of each.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Sample Data and Verification

In all cases below, we use a simple UTF-8 text file /file.txt with Latin ASCII characters on the first line and Cyrillic Unicode characters on the second:

$ cat /file.txt
Text.
Жрец.

For brevity, we omit the common Poppler pdftotext code we use to verify the final PDF files contain our original text:

$ pdftotext /file.pdf /extracted.txt
$ cat /extracted.txt
[...]

Naturally, with most tools, we don’t have much control over fonts, margins, paper sizes, and other such features. Still, there are exceptions. For example, the TeX document production system is a basis for many tools.

Taking this into account, let’s continue with different methods to convert text to PDF.

3. Using an Office Suite (LibreOffice, OpenOffice)

Perhaps one of the most universal document converters, office suites have long been a staple for formatted text, spreadsheets, and presentations. As it became widely adopted, most office suites also enabled users to export many files as PDF.

While suites can have a somewhat heavy graphical user interface (GUI), the main Linux open-source options like LibreOffice and Apache OpenOffice also provide a command-line interface (CLI) and listener.

Let’s install LibreOffice on Debian:

$ apt-get install libreoffice

Now, we can leverage a common command to perform a conversion:

$ soffice --convert-to pdf /file.txt
convert /file.txt -> /file.pdf using filter : writer_pdf_Export

In essence, we convert a text file to PDF via the –convert-to switch of soffice, a binary available with either office suite. Unicode is supported with this method.

This is probably one of the most stable and robust ways to perform a document conversion, preserve the text layer and whitespace, and have a standard PDF.

4. Common UNIX Printing System (CUPS)

The Common UNIX Printing System (CUPS) is an open-source abstraction for easier printing. Its main goal is to simplify the print interface by using standards and bridges.

Although it’s commonly preinstalled, let’s install the cups package on Debian:

$ apt-get install cups

To achieve its goal, CUPS might require several conversions until the source data reaches the printer in a suitable format. One way we can do a manual conversion from text to PDF is by supplying our text file to the (deprecated) cupsfilter command:

$ cupsfilter /file.txt > /file.pdf

This solution supports Unicode. While it’s useful, the cupsfilter command is deprecated and may be phased out at any moment.

5. Using ImageMagick convert

The ImageMagick convert command can convert between many formats, including PDF.

Let’s install the ImageMagick package on Debian:

$ apt-get install imagemagick

Now, we can use convert to make a PDF from our text file:

$ convert TEXT:/file.txt /file.pdf

While this code should work, it might result in an error:

convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/421.

This is due to a security vulnerability of Ghostscript versions older than 9.24. Ensuring ours is at least 9.24, we can correct this by modifying /etc/ImageMagick-#/policy.xml.

First, we check our version of Ghostscript:

$ gs --version
9.53.3

Since it’s more recent than 9.24, we can remove a section from /etc/ImageMagick-#/policy.xml:

<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="PS2" />
<policy domain="coder" rights="none" pattern="PS3" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="none" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />

To do this efficiently, we can use sed:

$ sed -i '/disable ghostscript format types/,+6d' /etc/ImageMagick-#/policy.xml

Finally, as ImageMagick handles raster image formats, there is no text layer in the resulting PDF. Still, convert supports Unicode.

6. Using Ghostscript ps2pdf in a Pipeline

Since PostScript is closely related to PDF, we can use Ghostscript ps2pdf with several tools, which don’t output PDF but can produce a PostScript format file.

The ps2pdf utility can directly handle stdin or use a file path.

6.1. GNU enscript

The GNU enscript command is a way to convert ASCII text files to PostScript (among others) for printing, similar to cupsfilter. In addition, it supports syntax highlighting for some programming languages.

Let’s install it on Debian:

$ apt-get install enscript

Now, we can use it with –output to stdout:

$ enscript --output=- /file.txt | ps2pdf - /file.pdf

By default, a header is added to the resulting PDF. This method only supports ASCII, so the output of our Unicode characters is undefined.

6.2. Pango Paps

The Paps converter can convert UTF-8 to PostScript using the Pango text rendering and layering library.

Let’s install paps on Debian:

$ apt-get install paps

Now, we can use the paps command with our file to convert text to PostScript via paps:

$ paps /file.txt | ps2pdf - /file.pdf

This method supports Unicode but, similar to ImageMagick, doesn’t produce a selectable text layer. However, the latest version of paps, currently only available via GitHub for Debian systems, does support text selection.

6.3. Vi Editor

In fact, the ubiquitous Vi editor can produce a PostScript file as long as it was compiled using the +postscript feature:

Let’s use Vi for our purposes via the -c startup commands switch and the :ha[rdcopy] command:

$ vi -c ':hardcopy > /file.ps | q' /file.txt; ps2pdf /file.ps /file.pdf

At this point, we have a /file.pdf file with our content. There is a header by default, but there are many printoptions we can apply as long as Vi is compiled with the +printer feature.

Despite its versatility, this method only supports ASCII.

7. Using the Pandoc Universal Document Converter

Although fairly heavy, the versatile pandoc document converter is indispensable when it comes to converting markup language formats.

Let’s install Pandoc on Debian:

$ apt-get install pandoc

To handle PDF files, pandoc leverages a TeX Live package that we may also have to install:

$ apt-get install texlive-xetex

At this point, we can convert –from text -to PDF with the xetex –pdf-engine via pandoc:

$ pandoc --pdf-engine=xelatex --variable='mainfont:DejaVuSansMono.ttf' /file.txt --output=/file.pdf

The default PDF engine is pdflatex, but it only handles ASCII. Because of this, we use xelatex and set the mainfont key with the –variable switch.

While this method works for Unicode, it removes newlines and doesn’t preserve text formatting. Since it’s also heavily dependent on big packages, it might not be the best choice.

8. Using unoserver and unoconv Universal Office Converter

The deprecated Universal Office Converter unoconv package and its successor – unoserver– can convert text files to PDF, among others.

Based on LibreOffice, they function much like the –convert-to switch of soffice. The unoserver package lowers resource usage with the listener mode of the office suite.

Let’s install both unoconv and unoserver on Debian:

$ apt-get install unoconv
$ pip install unoserver

Notably, unoserver is a pip module and might have other dependenices.

After installing, we can run a listener with unoserver or directly use unoconv with the PDF –format and our text file:

$ unoconv ---format=pdf /file.txt

Essentially, either method has the same result as the LibreOffice conversion but uses fewer resources.

9. Using the AsciiDoc a2x

The a2x utility, part of the AsciiDoc toolchain, can convert text files to several formats:

We can install it via the asciidoc-base package, but due to its size, we’ll use an alternative approach:

$ apt-get --no-install-recommends install asciidoc-base -y

Now, we can convert our text file to a PDF –format via a2x:

$ a2x --format=pdf /file.txt

There are a number of options available for this conversion. Critically, this method only works for ASCII text, not Unicode.

10. Summary

In this article, we explored many ways to convert text files, ASCII and Unicode, to PDF files.

In conclusion, only some methods, like office suites and tools that leverage them, fully support conversions of Unicode files along with their formatting while providing options to change the output.

2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.