Log4j1XmlLayout: replace invalid XML characters with U+FFFD#4078
Merged
Log4j1XmlLayout: replace invalid XML characters with U+FFFD#4078
Conversation
This change sanitizes the output of `Log4j1XmlLayout` by replacing characters that are not permitted in XML 1.0 with the Unicode replacement character (`U+FFFD`). This guarantees that the generated log output is always well-formed XML and can be parsed by any XML 1.0–compliant parser, even when log data contains control characters or other invalid code points.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens Log4j1XmlLayout XML output by sanitizing characters that are not permitted by XML 1.0, replacing them with U+FFFD to ensure the emitted logs remain well‑formed and parseable (mirroring the earlier XmlLayout change in #4077).
Changes:
- Update
Transform.escapeHtmlTagsandTransform.appendEscapingCDatato replace XML 1.0–invalid code points withU+FFFD(and to avoid allocations when no escaping is needed). - Add focused unit tests for
Transformsanitization/escaping behavior. - Add a regression test ensuring
Log4j1XmlLayoutoutput remains well‑formed when event data contains invalid XML characters. - Add a 2.x changelog entry documenting the fix.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/changelog/.2.x.x/4078_log4j1-xml-control-characters.xml |
Changelog entry documenting the XML sanitization fix for Log4j1XmlLayout. |
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java |
Implements XML 1.0 invalid character replacement and refactors escaping to be codepoint-aware and allocation-avoiding. |
log4j-core-test/src/test/java/org/apache/logging/log4j/core/util/TransformTest.java |
Adds parameterized tests for escaping and XML 1.0 sanitization behavior. |
log4j-1.2-api/src/test/java/org/apache/log4j/layout/Log4j1XmlLayoutTest.java |
Adds a regression test verifying Log4j1XmlLayout sanitizes invalid XML characters across fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java
Outdated
Show resolved
Hide resolved
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java
Show resolved
Hide resolved
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java
Outdated
Show resolved
Hide resolved
The `'` entity is not defined in HTML 4. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change sanitizes the output of
Log4j1XmlLayoutby replacing characters that are not permitted in XML 1.0 with the Unicode replacement character (U+FFFD).This guarantees that the generated log output is always well-formed XML and can be parsed by any XML 1.0–compliant parser, even when log data contains control characters or other invalid code points.
Related to #4077, which performs the same changes for
XmlLayout.