Skip to content

Log4j1XmlLayout: replace invalid XML characters with U+FFFD#4078

Merged
ppkarwasz merged 7 commits into2.25.xfrom
fix/2.25.x/log4j1-xml-control-characters
Mar 24, 2026
Merged

Log4j1XmlLayout: replace invalid XML characters with U+FFFD#4078
ppkarwasz merged 7 commits into2.25.xfrom
fix/2.25.x/log4j1-xml-control-characters

Conversation

@ppkarwasz
Copy link
Copy Markdown
Contributor

This change sanitizes the output of Log4j1XmlLayout by replacing characters that are not permitted in XML 1.0 with the Unicode replacement character (U+FFFD).

This guarantees that the generated log output is always well-formed XML and can be parsed by any XML 1.0–compliant parser, even when log data contains control characters or other invalid code points.

Related to #4077, which performs the same changes for XmlLayout.

This change sanitizes the output of `Log4j1XmlLayout` by replacing characters that are not permitted in XML 1.0 with the Unicode replacement character (`U+FFFD`).

This guarantees that the generated log output is always well-formed XML and can be parsed by any XML 1.0–compliant parser, even when log data contains control characters or other invalid code points.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Log4j1XmlLayout XML output by sanitizing characters that are not permitted by XML 1.0, replacing them with U+FFFD to ensure the emitted logs remain well‑formed and parseable (mirroring the earlier XmlLayout change in #4077).

Changes:

  • Update Transform.escapeHtmlTags and Transform.appendEscapingCData to replace XML 1.0–invalid code points with U+FFFD (and to avoid allocations when no escaping is needed).
  • Add focused unit tests for Transform sanitization/escaping behavior.
  • Add a regression test ensuring Log4j1XmlLayout output remains well‑formed when event data contains invalid XML characters.
  • Add a 2.x changelog entry documenting the fix.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/changelog/.2.x.x/4078_log4j1-xml-control-characters.xml Changelog entry documenting the XML sanitization fix for Log4j1XmlLayout.
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java Implements XML 1.0 invalid character replacement and refactors escaping to be codepoint-aware and allocation-avoiding.
log4j-core-test/src/test/java/org/apache/logging/log4j/core/util/TransformTest.java Adds parameterized tests for escaping and XML 1.0 sanitization behavior.
log4j-1.2-api/src/test/java/org/apache/log4j/layout/Log4j1XmlLayoutTest.java Adds a regression test verifying Log4j1XmlLayout sanitizes invalid XML characters across fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ppkarwasz and others added 3 commits March 24, 2026 21:45
The `'` entity is not defined in HTML 4.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@vy vy added bug Incorrect, unexpected, or unintended behavior of existing code layouts Affects one or more Layout plugins labels Mar 24, 2026
@vy vy added this to the 2.25.4 milestone Mar 24, 2026
@ppkarwasz ppkarwasz merged commit 25043cf into 2.25.x Mar 24, 2026
7 checks passed
@ppkarwasz ppkarwasz deleted the fix/2.25.x/log4j1-xml-control-characters branch March 24, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Incorrect, unexpected, or unintended behavior of existing code layouts Affects one or more Layout plugins

Projects

Development

Successfully merging this pull request may close these issues.

3 participants