eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Course – LJB – NPI EA (cat = Core Java)
announcement - icon

Code your way through and build up a solid, practical foundation of Java:

>> Learn Java Basics

Partner – LambdaTest – NPI EA (cat= Testing)
announcement - icon

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Introduction

In this quick article, we’ll focus on doing programmatic conversion between PDF files and other formats in Java.

More specifically, we’ll describe how to save PDFs as image files, such as PNG or JPEG, convert PDFs to Microsoft Word documents, export as an HTML, and extract the texts, by using multiple Java open-source libraries.

2. Maven Dependencies

The first library we’ll look at is Pdf2Dom. Let’s start with the Maven dependencies we need to add to our project:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox-tools</artifactId>
    <version>3.0.0</version>
</dependency>
<dependency>
    <groupId>net.sf.cssbox</groupId>
    <artifactId>pdf2dom</artifactId>
    <version>2.0.1</version>
</dependency>

We’re going to use the first dependency to load the selected PDF file. The second dependency is responsible for the conversion itself. The latest versions can be found here: pdfbox-tools and pdf2dom.

What’s more, we’ll use iText to extract the text from a PDF file and POI to create the .docx document.

Let’s take a look at Maven dependencies that we need to include in our project:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.10</version>
</dependency>
<dependency>
    <groupId>com.itextpdf.tool</groupId>
    <artifactId>xmlworker</artifactId>
    <version>5.5.10</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.15</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>3.15</version>
</dependency>

The latest version of iText can be found here and you can look for Apache POI here.

3. PDF and HTML Conversions

To work with HTML files we’ll use Pdf2Dom – a PDF parser that converts the documents to an HTML DOM representation. The obtained DOM tree can then be then serialized to an HTML file or further processed.

To convert PDF to HTML, we need to use XMLWorker, library that is provided by iText.

3.1. PDF to HTML

Let’s have a look at a simple conversion from PDF to HTML:

private void generateHTMLFromPDF(String filename) {
    PDDocument pdf = PDDocument.load(new File(filename));
    Writer output = new PrintWriter("src/output/pdf.html", "utf-8");
    new PDFDomTree().writeText(pdf, output);
    
    output.close();
}

In the code snippet above we load the PDF file, using the load API from PDFBox. With the PDF loaded, we use the parser to parse the file and write to output specified by java.io.Writer.

Note that converting PDF to HTML is never a 100%, pixel-to-pixel result. The results depend on the complexity and the structure of the particular PDF file.

3.2. HTML to PDF

Now, let’s have a look at conversion from HTML to PDF:

private static void generatePDFFromHTML(String filename) {
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document,
      new FileOutputStream("src/output/html.pdf"));
    document.open();
    XMLWorkerHelper.getInstance().parseXHtml(writer, document,
      new FileInputStream(filename));
    document.close();
}

Note that converting HTML to PDF, you need to ensure that HTML has all tags properly started and closed, otherwise the PDF will be not created. The positive aspect of this approach is that PDF will be created exactly the same as it was in HTML file.

4. PDF to Image Conversions

There are many ways of converting PDF files to an image. One of the most popular solutions is named Apache PDFBox. This library is an open source Java tool for working with PDF documents. For image to PDF conversion, we’ll use iText again.

4.1. PDF to Image

To start converting PDFs to images, we need to use dependency mentioned in the previous section – pdfbox-tools.

Let’s take a look at the code example:

private void generateImageFromPDF(String filename, String extension) {
    PDDocument document = PDDocument.load(new File(filename));
    PDFRenderer pdfRenderer = new PDFRenderer(document);
    for (int page = 0; page < document.getNumberOfPages(); ++page) {
        BufferedImage bim = pdfRenderer.renderImageWithDPI(
          page, 300, ImageType.RGB);
        ImageIOUtil.writeImage(
          bim, String.format("src/output/pdf-%d.%s", page + 1, extension), 300);
    }
    document.close();
}

There are few important parts in the above-mentioned code. We need to use PDFRenderer, in order to render PDF as a BufferedImage. Also, each page of the PDF file needs to be rendered separately.

Finally, we use ImageIOUtil, from Apache PDFBox Tools, to write an image, with the extension that we specify. Possible file formats are jpeg, jpg, gif, tiff or png.

Note that Apache PDFBox is an advanced tool – we can create our own PDF files from scratch, fill forms inside PDF file, sign and/or encrypt the PDF file.

4.2. Image to PDF

Let’s take a look at the code example:

private static void generatePDFFromImage(String filename, String extension) {
    Document document = new Document();
    String input = filename + "." + extension;
    String output = "src/output/" + extension + ".pdf";
    FileOutputStream fos = new FileOutputStream(output);

    PdfWriter writer = PdfWriter.getInstance(document, fos);
    writer.open();
    document.open();
    document.add(Image.getInstance((new URL(input))));
    document.close();
    writer.close();
}

Please note, that we can provide an image as a file, or load it from URL, as it is shown in the example above. Moreover, the extensions of the output file that we can use are jpeg, jpg, gif, tiff or png.

5. PDF to Text Conversions

To extract the raw text out of a PDF file, we’ll also use Apache PDFBox again. For text to PDF conversion, we are going to use iText.

5.1. PDF to Text

We created a method named generateTxtFromPDF(…) and divided it into three main parts: loading of the PDF file, extraction of text, and final file creation.

Let’s start with loading part:

File f = new File(filename);
String parsedText;
PDFParser parser = new PDFParser(new RandomAccessFile(f, "r"));
parser.parse();

In order to read a PDF file, we use PDFParser, with an “r” (read) option. Moreover, we need to use the parser.parse() method that will cause the PDF to be parsed as a stream and populated into the COSDocument object.

Let’s take a look at the extracting text part:

COSDocument cosDoc = parser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);

In the first line, we’ll save COSDocument inside the cosDoc variable. It will be then used to construct PDocument, which is the in-memory representation of the PDF document. Finally, we will use PDFTextStripper to return the raw text of a document. After all of those operations, we’ll need to use close() method to close all the used streams.

In the last part, we’ll save text into the newly created file using the simple Java PrintWriter:

PrintWriter pw = new PrintWriter("src/output/pdf.txt");
pw.print(parsedText);
pw.close();

Please note that you cannot preserve formatting in a plain text file because it contains text only.

5.2. Text to PDF

Converting text files to PDF is bit tricky. In order to maintain the file formatting, you’ll need to apply additional rules.

In the following example, we are not taking into consideration the formatting of the file.

First, we need to define the size of the PDF file, version and output file. Let’s have a look at the code example:

Document pdfDoc = new Document(PageSize.A4);
PdfWriter.getInstance(pdfDoc, new FileOutputStream("src/output/txt.pdf"))
  .setPdfVersion(PdfWriter.PDF_VERSION_1_7);
pdfDoc.open();

In the next step, we’ll define the font and also the command that is used to generate new paragraph:

Font myfont = new Font();
myfont.setStyle(Font.NORMAL);
myfont.setSize(11);
pdfDoc.add(new Paragraph("\n"));

Finally, we are going to add paragraphs into newly created PDF file:

BufferedReader br = new BufferedReader(new FileReader(filename));
String strLine;
while ((strLine = br.readLine()) != null) {
    Paragraph para = new Paragraph(strLine + "\n", myfont);
    para.setAlignment(Element.ALIGN_JUSTIFIED);
    pdfDoc.add(para);
}	
pdfDoc.close();
br.close();

6. PDF to Docx Conversions

Creating PDF file from Word document is not easy, and we’ll not cover this topic here. We recommend 3rd party libraries to do it, like jWordConvert.

To create Microsoft Word file from a PDF, we’ll need two libraries. Both libraries are open source. The first one is iText and it is used to extract the text from a PDF file. The second one is POI and is used to create the .docx document.

Let’s take a look at the code snippet for the PDF loading part:

XWPFDocument doc = new XWPFDocument();
String pdf = filename;
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);

After loading of the PDF, we need to read and render each page separately in the loop, and then write to the output file:

for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    TextExtractionStrategy strategy =
      parser.processContent(i, new SimpleTextExtractionStrategy());
    String text = strategy.getResultantText();
    XWPFParagraph p = doc.createParagraph();
    XWPFRun run = p.createRun();
    run.setText(text);
    run.addBreak(BreakType.PAGE);
}
FileOutputStream out = new FileOutputStream("src/output/pdf.docx");
doc.write(out);
// Close all open files

Please note, that with the SimpleTextExtractionStrategy() extraction strategy, we’ll lose all formatting rules. In order to fix it, play with extraction strategies described here, to achieve a more complex solution.

7. PDF to X Commercial Libraries

In previous sections, we described open source libraries. There are few more libraries worth notice, but they are paid:

  • jPDFImages – jPDFImages can create images from pages in a PDF document and export them as JPEG, TIFF, or PNG images.
  • JPEDAL – JPedal is an actively developed and very capable native Java PDF library SDK used for printing, viewing and conversion of files
  • pdfcrowd – it’s another Web/HTML to PDF and PDF to Web/HTML conversion library, with advanced GUI

8. Docx to PDF Conversion

To convert a Docx file to a PDF document, we’ll need the Apache POI library to read the Word document and the iText library to generate the PDF.

Here’s some simple code that reads a Docx file and writes its content to a PDF file:

InputStream docxInputStream = new FileInputStream("input.docx");
try (XWPFDocument document = new XWPFDocument(docxInputStream); 
    OutputStream pdfOutputStream = new FileOutputStream("output.pdf");) {
    Document pdfDocument = new Document();
    PdfWriter.getInstance(pdfDocument, pdfOutputStream);
    pdfDocument.open();
            
    List<XWPFParagraph> paragraphs = document.getParagraphs();
    for (XWPFParagraph paragraph : paragraphs) {
        pdfDocument.add(new Paragraph(paragraph.getText()));
    }
    pdfDocument.close();
}

First, we create an InputStream object to read the Docx file. The XWPFDocument class helps to represent the Docx file in memory. Next, we create an instance of OutputStream to write to the PDF file. The PdfWriter and Document classes help prepare the PDF file for writing. Also, we get the list of paragraphs from the Docx file by invoking the getParagraphs() method on document.

Then, we loop through each paragraph, create a new Paragraph object and add it to the PDF document using the add() method.

Finally, we close the PDF document using the close() method. This finalizes the PDF file and ensures that all changes are written to disk.

9. Conclusion

In this article, we discussed the ways to convert PDF files to and from various formats.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

eBook Jackson – NPI EA – 3 (cat = Jackson)