1. Overview

Both PDF and EPUB are popular document formats widely used in digital publications. However, converting a PDF file to an EPUB can enhance readability, accessibility, and text reflow.

In this tutorial, we’ll explore two methods to convert PDF files to the EPUB format in Linux.

2. EPUB Over PDF

Let’s discuss some cases in which the EPUB format is better suited than the PDF format.

The PDF format is useful when we have a document with a fixed layout. However, in the case of eBooks or digital documents with dynamic layouts, the EPUB format performs better than the PDF format. The EPUB format’s reflowable nature enables text to fit onto screens, usually making the information easier to read.

Moreover, EPUBs can adapt to the various dimensions of reading devices. Hence, readers don’t need to incessantly zoom in and out to read text. Furthermore, it ensures that users see the same content no matter what screen size they have. Therefore, the EPUB format provides a better reading experience for reading documents on smartphones, tablets, and e-readers.

Editing EPUB files is much easier than editing PDF files, as we don’t need specialized software to edit them. This ease of editing enables us to make revisions and update published documents quickly.

Finally, due to their file structure, EPUB files are generally smaller in size than PDF files. Hence, they’re easy to download, share, and store in devices with limited internal storage capacity.

3. Using Calibre

Calibre is a powerful document management tool in Linux. We can use it to organize our digital documents using metadata, convert between different formats, and edit documents.

3.1. Installation

First, let’s discuss how to deploy Calibre in Linux.

To install Calibre in Debian-based systems, we can use the apt command:

$ sudo apt install calibre

Furthermore, on Arch and Arch-derivatives, we utilize the pacman command to install Calibre:

$ sudo pacman -S calibre

After the installation, let’s verify the version of the Calibre tool. We use the ebook-viewer command to view the current version of Calibre:

$ ebook-viewer --version
ebook-viewer (calibre 5.37)

As the output shows, the installation is completed.

3.2. Usage

Now, we open the Calibre tool:

$ calibre

Let’s take a look at the main screen:

The main page of the Calibre tool

The next step is to go to the Add books option and add the target PDF file:

Add book option of the Calibre tool

As soon as we add the PDF file, it appears on the main list of the Calibre tool:

Added file in the Calibre tool

In this case, we added a PDF file named sample.pdf. Now, we right-click on the PDF file and go to the Convert option:

Convert option in the Calibre tool

Before we start the conversion process, the Calibre tool shows us a preview which includes details about the added file and output format:

preview before the conversion in the calibre tool After verification, we confirm the conversion. Finally, when the conversion is done, the Calibre tool gives us a notification.

4. Using pdftohtml

An alternative approach is to convert the PDF file to HTML format using the pdftohtml tool and then convert the HTML file to EPUB format.

We mainly utilize the pdftohtml tool in Linux to convert PDF documents to HTML format. Often, the pdftohtml tool is used for Web publishing and text extraction.

4.1. Installation

The pdftohtml tool is a part of the Poppler utilities. We can install the pdftohtml tool by installing Poppler in Debian-based systems using the apt command:

$ sudo apt-get install poppler-utils

Alternatively, we can install pdftohtml in Arch and Arch-derivatives using the pacman command:

$ sudo pacman -S poppler

Now, let’s verify the installation status of pdftohtml:

$ pdftohtml -v
pdftohtml version 22.02.0
Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC

Thus, we successfully installed pdftohtml in our system.

4.2. Usage

First, let’s take a look at the metadata of the input PDF file using the pdfinfo command, which is a part of the Poppler utilities:

$ pdfinfo sample.pdf
Creator:         Writer
Producer:        LibreOffice 4.2
CreationDate:    Wed Aug 16 08:42:28 2017 EDT
Custom Metadata: no
Metadata Stream: no
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           5
Encrypted:       no
Page size:       595 x 842 pts (A4)
Page rot:        0
File size:       469513 bytes
Optimized:       no
PDF version:     1.4

The next step is to use pdftohtml to convert the PDF file into the HTML format:

$ pdftohtml sample.pdf output1.html
Page-1
Page-2
Page-3
Page-4
Page-5

The output shows that all five pages of the PDF file are converted to HTML.

Moreover, the final step is to convert the HTML file into the EPUB format. To make this conversion, we utilize the ebook-convert command, which is part of the Calibre tool:

$ ebook-convert output1.html final_output.epub
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/sam/Downloads/output1.html
Building file list...
Normalizing filename cases
Rewriting HTML links
Forcing output1.html into XHTML namespace
...output truncated...

At this point, we’ve successfully converted the PDF file into the EPUB format.

5. Conclusion

In this article, we discussed two methods to convert PDF files to the EPUB format in Linux.

Calibre is a comprehensive document management tool that facilitates a direct option for converting documents in different formats, including PDF to EPUB.

On the other hand, we can use the pdftohtml tool to convert the PDF file to the HTML format and then use Calibre to convert the HTML file to the EPUB format.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments