Partner – Expected Behavior – NPI (tag=PDF)
announcement - icon

Creating PDFs is actually surprisingly hard. When we first tried, none of the existing PDF libraries met our needs. So we made DocRaptor for ourselves and later launched it as one of the first HTML-to-PDF APIs.

We think DocRaptor is the fastest and most scalable way to make PDFs, especially high-quality or complex PDFs. And as developers ourselves, we love good documentation, no-account trial keys, and an easy setup process.

>> Try DocRaptor's HTML-to-PDF Java Client (No Signup Required)

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we’ll get to know different ways of getting information about a PDF file using the iText and PDFBox libraries in Java.

2. Using the iText Library

iText is a library for creating and manipulating PDF documents. Also, it provides an easy way to get information about the document.

2.1. Maven Dependency

Let’s start by declaring the itextpdf dependency in our pom.xml:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version>
</dependency>

2.2. Getting the Number of Pages

Let’s create a PdfInfoIText class with a getNumberOfPages() method that returns the number of pages in a PDF document:

public class PdfInfoIText {

    public static int getNumberOfPages(final String pdfFile) throws IOException {
        PdfReader reader = new PdfReader(pdfFile);
        int pages = reader.getNumberOfPages();
        reader.close();
        return pages;
    }
}

In our example, first, we use the PdfReader class to load a PDF from a File object. After that, we use the getNumberOfPages() method. And finally, we close the PdfReader object. Let’s declare a test case for it:

@Test
public void givenPdf_whenGetNumberOfPages_thenOK() throws IOException {
    Assert.assertEquals(4, PdfInfoIText.getNumberOfPages(PDF_FILE));
}

In our test case, we validate the number of pages in a given PDF file stored in the test resources folder.

2.3. Getting the PDF Metadata

Let’s now have a look at how we can get metadata of the document. We’ll use the getInfo() method. This method can get the information of the file, like title, author, creation date, creator, producer, and so on. Let’s add the getInfo() method to our PdfInfoIText class:

public static Map<String, String> getInfo(final String pdfFile) throws IOException {
    PdfReader reader = new PdfReader(pdfFile);
    Map<String, String> info = reader.getInfo();
    reader.close();
    return info;
}

Now, let’s write a test case for fetching the creator and producer of the document:

@Test
public void givenPdf_whenGetInfo_thenOK() throws IOException {
    Map<String, String> info = PdfInfoIText.getInfo(PDF_FILE);
    Assert.assertEquals("LibreOffice 4.2", info.get("Producer"));
    Assert.assertEquals("Writer", info.get("Creator"));
}

2.4. Knowing the PDF Password Protection

We’ll now want to know if there is password protection on the document. For this, let’s add the isEncrypted() method to the PdfInfoIText class:

public static boolean isPasswordRequired(final String pdfFile) throws IOException {
    PdfReader reader = new PdfReader(pdfFile);
    boolean isEncrypted = reader.isEncrypted();
    reader.close();
    return isEncrypted;
}

Now, let’s create a test case to see how this method behaves:

@Test
public void givenPdf_whenIsPasswordRequired_thenOK() throws IOException {
    Assert.assertFalse(PdfInfoIText.isPasswordRequired(PDF_FILE));
}

In the next section, we’ll do the same work using the PDFBox library.

3. Using the PDFBox Library

Another way of getting information about a PDF file is by using the Apache PDFBox library.

3.1. Maven Dependency

We need to include the pdfbox Maven dependency in our project:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.0</version>
</dependency>

3.2. Getting the Number of Pages

The PDFBox library provides the ability to work with PDF documents. For getting the number of pages, we simply use the Loader class and its loadPDF() method to load the document from the File object. After that, we use the getNumberOfPages() method of the PDDocument class:

public class PdfInfoPdfBox {

    public static int getNumberOfPages(final String pdfFile) throws IOException {
        File file = new File(pdfFile);
        PDDocument document = Loader.loadPDF(file);
        int pages = document.getNumberOfPages();
        document.close();
        return pages;
    }
}

Let’s create a test case for it:

@Test
public void givenPdf_whenGetNumberOfPages_thenOK() throws IOException {
    Assert.assertEquals(4, PdfInfoPdfBox.getNumberOfPages(PDF_FILE));
}

3.3. Getting the PDF Metadata

Getting the PDF metadata is also straightforward. We need to use the getDocumentInformation() method. This method returns document metadata (such as the author of the document or its creation date) as a PDDocumentInformation object:

public static PDDocumentInformation getInfo(final String pdfFile) throws IOException {
    File file = new File(pdfFile);
    PDDocument document = Loader.loadPDF(file);
    PDDocumentInformation info = document.getDocumentInformation();
    document.close();
    return info;
}

Let’s write a test case for it:

@Test
public void givenPdf_whenGetInfo_thenOK() throws IOException {
    PDDocumentInformation info = PdfInfoPdfBox.getInfo(PDF_FILE);
    Assert.assertEquals("LibreOffice 4.2", info.getProducer());
    Assert.assertEquals("Writer", info.getCreator());
}

In this test case, we just validate the producer and creator of the document.

3.4. Knowing the PDF Password Protection

We can check if the PDF is password protected using the isEncrypted() method of the PDDocument class:

public static boolean isPasswordRequired(final String pdfFile) throws IOException {
    File file = new File(pdfFile);
    PDDocument document = Loader.loadPDF(file);
    boolean isEncrypted = document.isEncrypted();
    document.close();
    return isEncrypted;
}

Let’s create a test case for the validation of password protection:

@Test
public void givenPdf_whenIsPasswordRequired_thenOK() throws IOException {
    Assert.assertFalse(PdfInfoPdfBox.isPasswordRequired(PDF_FILE));
}

4. Conclusion

In this article, we learned how to get information about a PDF file using two popular Java libraries. A working version of the code shown in this article is available over on GitHub.

Partner – Expected Behavior – NPI (tag=PDF)
announcement - icon

Creating PDFs is actually surprisingly hard. When we first tried, none of the existing PDF libraries met our needs. So we made DocRaptor for ourselves and later launched it as one of the first HTML-to-PDF APIs.

We think DocRaptor is the fastest and most scalable way to make PDFs, especially high-quality or complex PDFs. And as developers ourselves, we love good documentation, no-account trial keys, and an easy setup process.

>> Try DocRaptor's HTML-to-PDF Java Client (No Signup Required)

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.