eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Course – LJB – NPI EA (cat = Core Java)
announcement - icon

Code your way through and build up a solid, practical foundation of Java:

>> Learn Java Basics

Partner – LambdaTest – NPI EA (cat= Testing)
announcement - icon

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Overview

In this article, we will explore various methods to turn XML files into CSV format using Java.

XML (Extensible Markup Language) and CSV (Comma-Separated Values) are both popular choices for data exchange. While XML is a powerful option that allows for a structured, layered approach to complicated data sets, CSV is more straightforward and designed primarily for tabular data. 

Sometimes, there might be situations where we need to convert an XML to a CSV to make data import or analysis easier.

2. Introduction to XML Data Layout

Imagine we run a bunch of bookstores, and we’ve stored our inventory data in an XML format similar to the example below:

<?xml version="1.0"?>
<Bookstores>
    <Bookstore id="S001">
        <Books>
            <Book id="B001" category="Fiction">
                <Title>Death and the Penguin</Title>
                <Author id="A001">Andrey Kurkov</Author>
                <Price>10.99</Price>
            </Book>
            <Book id="B002" category="Poetry">
                <Title>Kobzar</Title>
                <Author id="A002">Taras Shevchenko</Author>
                <Price>8.50</Price>
            </Book>
        </Books>
    </Bookstore>
    <Bookstore id="S002">
        <Books>
            <Book id="B003" category="Novel">
                <Title>Voroshilovgrad</Title>
                <Author id="A003">Serhiy Zhadan</Author>
                <Price>12.99</Price>
            </Book>
        </Books>
    </Bookstore>
</Bookstores>

This XML organizes attributes ‘id’ and ‘category’ and text elements ‘Title,’ ‘Author,’ and ‘Price’ neatly in a hierarchy. Ensuring a well-structured XML simplifies the conversion process, making it more straightforward and error-free.

The goal is to convert this data into a CSV format for easier handling in tabular form. To illustrate, let’s take a look at how the bookstores from our XML data would be represented in the CSV format:

bookstore_id,book_id,category,title,author_id,author_name,price
S001,B001,Fiction,Death and the Penguin,A001,Andrey Kurkov,10.99
S001,B002,Poetry,Kobzar,A002,Taras Shevchenko,8.50
S002,B003,Novel,Voroshilovgrad,A003,Serhiy Zhadan,12.99

Moving forward, we’ll discuss the methods to achieve this conversion.

3. Converting using XSLT

3.1. Introduction to XSLT

XSLT (Extensible Stylesheet Language Transformations) is a tool that changes XML files into various other formats like HTML, plain text, or even CSV.

It operates by following rules set in a special stylesheet, usually an XSL file. This becomes especially useful when we aim to convert XML to CSV for easier use.

3.2. XSLT Conversion Process

To get started, we’ll need to create an XSLT stylesheet that uses XPath to navigate the XML tree structure and specifies how to convert the XML elements into CSV rows and columns.

Below is an example of such an XSLT file:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
    <xsl:template match="/">
        <xsl:text>bookstore_id,book_id,category,title,author_id,author_name,price</xsl:text>
        <xsl:text>&#xA</xsl:text>
        <xsl:for-each select="//Bookstore">
            <xsl:variable name="bookstore_id" select="@id"/>
            <xsl:for-each select="./Books/Book">
                <xsl:variable name="book_id" select="@id"/>
                <xsl:variable name="category" select="@category"/>
                <xsl:variable name="title" select="Title"/>
                <xsl:variable name="author_id" select="Author/@id"/>
                <xsl:variable name="author_name" select="Author"/>
                <xsl:variable name="price" select="Price"/>
                <xsl:value-of select="concat($bookstore_id, ',', $book_id, ',', $category, ',', $title, ',', $author_id, ',', $author_name, ',', $price)"/>
                <xsl:text>&#xA</xsl:text>
            </xsl:for-each>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

This stylesheet first matches the root element and then examines each ‘Bookstore’ node, gathering its attributes and child elements. Like the book’s id, category, and so on, into variables. These variables are then used to build out each row in the CSV file. CSV will have columns for bookstore ID, book ID, category, title, author ID, author name, and price.

The <xsl:template> sets transformation rules. It targets the XML root with <xsl:template match=”/”> and then defines the CSV header.

The instruction <xsl:for-each select=”//Bookstore”> processes each ‘Bookstore’ node and captures its attributes. Another inner instruction, <xsl:for-each select=”./Books/Book”>, processes each ‘Book‘ within the current ‘Bookstore‘.

The concat() function combines these values into a CSV row.

The adds a line feed (LF) character, corresponding to the ASCII value of 0xA in hexadecimal notation.

Here’s how we can use the Java-based XSLT processor:

void convertXml2CsvXslt(String xslPath, String xmlPath, String csvPath) throws IOException, TransformerException {
    StreamSource styleSource = new StreamSource(new File(xslPath));
    Transformer transformer = TransformerFactory.newInstance()
      .newTransformer(styleSource);
    Source source = new StreamSource(new File(xmlPath));
    Result outputTarget = new StreamResult(new File(csvPath));
    transformer.transform(source, outputTarget);
}

We use TransformerFactory to compile our XSLT stylesheet. Then, we create a Transformer object, which takes care of applying this stylesheet to our XML data, turning it into a CSV file. Once the code runs successfully, a new file will appear in the specified directory.

Using XSLT for XML to CSV conversion is highly convenient and flexible, offering a standardized and powerful approach for most use cases, but it requires loading the whole XML file into memory. This can be a drawback for large files. While it’s perfect for medium-sized data sets, if we have a larger dataset, you might want to consider using StAX, which we’ll get into next.

4. Using StAX

4.1. Introduction to StAX

StAX (Streaming API for XML) is designed to read and write XML files in a more memory-efficient way. It allows us to process XML documents on the fly, making it ideal for handling large files.

Converting using StAX involves three main steps.

  • Initialize the StAX Parser
  • Reading XML Elements
  • Writing to CSV

4.2. StAX Conversion Process

Here’s a full example, encapsulated in a method named convertXml2CsvStax():

void convertXml2CsvStax(String xmlFilePath, String csvFilePath) throws IOException, TransformerException {
    XMLInputFactory inputFactory = XMLInputFactory.newInstance();

    try (InputStream in = Files.newInputStream(Paths.get(xmlFilePath)); BufferedWriter writer = new BufferedWriter(new FileWriter(csvFilePath))) {
        writer.write("bookstore_id,book_id,category,title,author_id,author_name,price\n");

        XMLStreamReader reader = inputFactory.createXMLStreamReader(in);

        String currentElement;
        StringBuilder csvRow = new StringBuilder();
        StringBuilder bookstoreInfo = new StringBuilder();

        while (reader.hasNext()) {
            int eventType = reader.next();

            switch (eventType) {
                case XMLStreamConstants.START_ELEMENT:
                    currentElement = reader.getLocalName();
                    if ("Bookstore".equals(currentElement)) {
                        bookstoreInfo.setLength(0);
                        bookstoreInfo.append(reader.getAttributeValue(null, "id"))
                          .append(",");
                    }
                    if ("Book".equals(currentElement)) {
                        csvRow.append(bookstoreInfo)
                          .append(reader.getAttributeValue(null, "id"))
                          .append(",")
                          .append(reader.getAttributeValue(null, "category"))
                          .append(",");
                    }
                    if ("Author".equals(currentElement)) {
                        csvRow.append(reader.getAttributeValue(null, "id"))
                          .append(",");
                    }
                    break;

                case XMLStreamConstants.CHARACTERS:
                    if (!reader.isWhiteSpace()) {
                        csvRow.append(reader.getText()
                          .trim())
                          .append(",");
                    }
                    break;

                case XMLStreamConstants.END_ELEMENT:
                    if ("Book".equals(reader.getLocalName())) {
                        csvRow.setLength(csvRow.length() - 1);
                        csvRow.append("\n");
                        writer.write(csvRow.toString());
                        csvRow.setLength(0);
                    }
                    break;
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

To begin, we initialize the StAX parser by creating an instance of XMLInputFactory. We then use this factory object to generate an XMLStreamReader:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream in = new FileInputStream(xmlFilePath);
XMLStreamReader reader = inputFactory.createXMLStreamReader(in);

We use the XMLStreamReader to iterate through the XML file, and based on the event type, such as START_ELEMENT, CHARACTERS, and END_ELEMENT, we build our CSV rows.

As we read the XML data, we build up CSV rows and write them to the output file using a BufferedWriter.

So, in a nutshell, StAX offers a memory-efficient solution that’s well-suited for processing large or real-time XML files. While it may require more manual effort and lacks some of the transformation features of XSLT, it excels in specific scenarios where resource utilization is a concern. With the foundational knowledge and example provided, we are now prepared to use StAX for our XML to CSV conversion needs when those specific conditions apply.

5. Additional Methods

We’ve primarily focused on XSLT and StAX as XML to CSV conversion methods. However, other options like DOM (Document Object Model) parsers, SAX (Simple API for XML) parsers, and Apache Commons CSV also exist.

Yet, there are some factors to consider. DOM parsers are great for loading the whole XML file into memory, giving you the flexibility to traverse and manipulate the XML tree freely. On the other hand, they do make you work a bit harder when you need to transform that XML data into CSV format.

When it comes to SAX parsers, they are more memory-efficient but can present challenges for complex manipulations. Their event-driven nature requires you to manage the state manually, and they offer no option for looking ahead or behind in the XML document, making certain transformations cumbersome.

Apache Commons CSV shines when writing CSV files but expects you to handle the XML parsing part yourself.

In summary, while each alternative has its own advantages, for this example, XSLT and StAX provide a more balanced solution for most XML to CSV conversion tasks.

6. Best Practices

To convert XML to CSV, several factors, such as data integrity, performance, and error handling, need to be considered. Validating the XML against its schema is crucial for confirming the data structure. In addition, proper mapping of XML elements to CSV columns is a fundamental step.

For large files, using streaming techniques like StAX can be advantageous for memory efficiency. Also, consider breaking down large files into smaller batches for easier processing.

It’s important to mention that the code examples provided may not handle special characters found in XML data, including but not limited to commas, newlines, and double quotes. For example, a comma within a field value can conflict with the comma used to delimit fields in the CSV. Similarly, a newline character could disrupt the logical structure of the file.

Addressing such issues can be complex and varies depending on specific project requirements. To work around commas, you can enclose fields in double quotes in the resulting CSV file. That said, to keep the code examples in this article easy to follow, these special cases have not been addressed. Therefore, this aspect should be taken into account for a more accurate conversion.

7. Conclusion

In this article, we explored various methods for converting XML to CSV, specifically diving into the XSLT and StAX methods. Regardless of the method chosen, having a well-suited XML structure for CSV, implementing data validation, and knowing which special characters to handle are essential for a smooth and successful conversion.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

eBook Jackson – NPI EA – 3 (cat = Jackson)
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments