Expand Authors Top

If you have a few years of experience in the Java ecosystem and you’d like to share that with the community, have a look at our Contribution Guidelines.

Partner – Frontegg – expanded (cat = Security)
announcement - icon User management is very complex, when implemented properly. No surprise here.

Not having to roll all of that out manually, but instead integrating a mature, fully-fledged solution - yeah, that makes a lot of sense.
That's basically what Frontegg is - User Management for your application. It's focused on making your app scalable, secure and enjoyable for your users.
From signup to authentication, it supports simple scenarios all the way to complex and custom application logic.

Have a look:

>> Elegant User Management, Tailor-made for B2B SaaS

Partner – Frontegg – expanded (cat = Spring Security)
announcement - icon User management is very complex, when implemented properly. No surprise here.

Not having to roll all of that out manually, but instead integrating a mature, fully-fledged solution - yeah, that makes a lot of sense.
That's basically what Frontegg is - User Management for your application. It's focused on making your app scalable, secure and enjoyable for your users.
From signup to authentication, it supports simple scenarios all the way to complex and custom application logic.

Have a look:

>> Elegant User Management, Tailor-made for B2B SaaS

Generic Top

Get started with Spring 5 and Spring Boot 2, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we'll get to know different ways of getting information about a PDF file using the iText and PDFBox libraries in Java.

2. Using the iText Library

iText is a library for creating and manipulating PDF documents. Also, it provides an easy way to get information about the document.

2.1. Maven Dependency

Let's start by declaring the itextpdf dependency in our pom.xml:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version>
</dependency>

2.2. Getting the Number of Pages

Let's create a PdfInfoIText class with a getNumberOfPages() method that returns the number of pages in a PDF document:

public class PdfInfoIText {

    public static int getNumberOfPages(final String pdfFile) throws IOException {
        PdfReader reader = new PdfReader(pdfFile);
        int pages = reader.getNumberOfPages();
        reader.close();
        return pages;
    }
}

In our example, first, we use the PdfReader class to load a PDF from a File object. After that, we use the getNumberOfPages() method. And finally, we close the PdfReader object. Let's declare a test case for it:

@Test
public void givenPdf_whenGetNumberOfPages_thenOK() throws IOException {
    Assert.assertEquals(4, PdfInfoIText.getNumberOfPages(PDF_FILE));
}

In our test case, we validate the number of pages in a given PDF file stored in the test resources folder.

2.3. Getting the PDF Metadata

Let's now have a look at how we can get metadata of the document. We'll use the getInfo() method. This method can get the information of the file, like title, author, creation date, creator, producer, and so on. Let's add the getInfo() method to our PdfInfoIText class:

public static Map<String, String> getInfo(final String pdfFile) throws IOException {
    PdfReader reader = new PdfReader(pdfFile);
    Map<String, String> info = reader.getInfo();
    reader.close();
    return info;
}

Now, let's write a test case for fetching the creator and producer of the document:

@Test
public void givenPdf_whenGetInfo_thenOK() throws IOException {
    Map<String, String> info = PdfInfoIText.getInfo(PDF_FILE);
    Assert.assertEquals("LibreOffice 4.2", info.get("Producer"));
    Assert.assertEquals("Writer", info.get("Creator"));
}

2.4. Knowing the PDF Password Protection

We'll now want to know if there is password protection on the document. For this, let's add the isEncrypted() method to the PdfInfoIText class:

public static boolean isPasswordRequired(final String pdfFile) throws IOException {
    PdfReader reader = new PdfReader(pdfFile);
    boolean isEncrypted = reader.isEncrypted();
    reader.close();
    return isEncrypted;
}

Now, let's create a test case to see how this method behaves:

@Test
public void givenPdf_whenIsPasswordRequired_thenOK() throws IOException {
    Assert.assertFalse(PdfInfoIText.isPasswordRequired(PDF_FILE));
}

In the next section, we'll do the same work using the PDFBox library.

3. Using the PDFBox Library

Another way of getting information about a PDF file is by using the Apache PDFBox library.

3.1. Maven Dependency

We need to include the pdfbox Maven dependency in our project:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.0-RC1</version>
</dependency>

3.2. Getting the Number of Pages

The PDFBox library provides the ability to work with PDF documents. For getting the number of pages, we simply use the Loader class and its loadPDF() method to load the document from the File object. After that, we use the getNumberOfPages() method of the PDDocument class:

public class PdfInfoPdfBox {

    public static int getNumberOfPages(final String pdfFile) throws IOException {
        File file = new File(pdfFile);
        PDDocument document = Loader.loadPDF(file);
        int pages = document.getNumberOfPages();
        document.close();
        return pages;
    }
}

Let's create a test case for it:

@Test
public void givenPdf_whenGetNumberOfPages_thenOK() throws IOException {
    Assert.assertEquals(4, PdfInfoPdfBox.getNumberOfPages(PDF_FILE));
}

3.3. Getting the PDF Metadata

Getting the PDF metadata is also straightforward. We need to use the getDocumentInformation() method. This method returns document metadata (such as the author of the document or its creation date) as a PDDocumentInformation object:

public static PDDocumentInformation getInfo(final String pdfFile) throws IOException {
    File file = new File(pdfFile);
    PDDocument document = Loader.loadPDF(file);
    PDDocumentInformation info = document.getDocumentInformation();
    document.close();
    return info;
}

Let's write a test case for it:

@Test
public void givenPdf_whenGetInfo_thenOK() throws IOException {
    PDDocumentInformation info = PdfInfoPdfBox.getInfo(PDF_FILE);
    Assert.assertEquals("LibreOffice 4.2", info.getProducer());
    Assert.assertEquals("Writer", info.getCreator());
}

In this test case, we just validate the producer and creator of the document.

3.4. Knowing the PDF Password Protection

We can check if the PDF is password protected using the isEncrypted() method of the PDDocument class:

public static boolean isPasswordRequired(final String pdfFile) throws IOException {
    File file = new File(pdfFile);
    PDDocument document = Loader.loadPDF(file);
    boolean isEncrypted = document.isEncrypted();
    document.close();
    return isEncrypted;
}

Let's create a test case for the validation of password protection:

@Test
public void givenPdf_whenIsPasswordRequired_thenOK() throws IOException {
    Assert.assertFalse(PdfInfoPdfBox.isPasswordRequired(PDF_FILE));
}

4. Conclusion

In this article, we learned how to get information about a PDF file using two popular Java libraries. A working version of the code shown in this article is available over on GitHub.

Generic bottom

Get started with Spring 5 and Spring Boot 2, through the Learn Spring course:

>> CHECK OUT THE COURSE
Generic footer banner
guest
0 Comments
Inline Feedbacks
View all comments