Fix XmlReader ArgumentOutOfRangeException for malformed UTF-8 in XML …#124428
Fix XmlReader ArgumentOutOfRangeException for malformed UTF-8 in XML …#124428danmoseley merged 2 commits intodotnet:mainfrom
Conversation
…declaration When SafeAsciiDecoder is used during initial XML declaration parsing, it maps each byte to a char 1:1, including bytes >0x7F. UnDecodeChars() previously used Encoding.GetByteCount() on these chars to compute bytePos, but non-ASCII chars like U+00BF (from byte 0xBF) produce 2 bytes in UTF-8, inflating bytePos beyond bytesUsed and causing ArgumentOutOfRangeException in BlockCopy. Fix: detect SafeAsciiDecoder and use charPos directly (correct 1:1 mapping), then validate that the target encoding byte count matches. If mismatched, throw XmlException with Xml_InvalidCharInThisEncoding, detecting invalid bytes before they corrupt ParsingState. Fixes dotnet#113061 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-xml |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where malformed UTF-8 bytes in XML declarations caused ArgumentOutOfRangeException instead of the expected XmlException. The issue occurred because SafeAsciiDecoder (used during initial XML parsing) maps each byte to a char 1:1, including bytes >0x7F. When UnDecodeChars() used Encoding.GetByteCount() to compute byte positions, non-ASCII chars like U+00BF (from byte 0xBF) would produce 2 bytes in UTF-8, inflating the byte position beyond valid bounds.
Changes:
- Added detection for
SafeAsciiDecoderinUnDecodeChars()to use the correct 1:1 byte-to-char mapping - Added validation that target encoding byte count matches character count, throwing
XmlExceptionfor invalid bytes - Added regression test to verify malformed UTF-8 in XML declarations throws
XmlExceptioninstead ofArgumentOutOfRangeException
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/libraries/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs | Added SafeAsciiDecoder-specific handling in UnDecodeChars() to correctly compute byte positions and validate encoding compatibility |
| src/libraries/System.Private.Xml/tests/XmlReader/Tests/ReaderEncodingTests.cs | Added regression test for #113061 verifying malformed UTF-8 byte 0xBF in XML declaration throws XmlException |
🤖 Copilot Code Review — PR #124428Holistic AssessmentMotivation: Legitimate bug fix for a confirmed regression (#113061) found by fuzzing. When Approach: The fix correctly identifies the mismatch between Summary: Detailed Findings✅ Core Fix — Correct byte position computationThe central fix is sound:
✅ Async path coverageThe async code ( ✅ Exception handling —
|
Add ReadWithMultiByteUtf8CharInXmlDeclaration test covering the original issue repro where Encoding.UTF8 produces 0xC2 0xBF for \xbf. SafeAsciiDecoder maps these two bytes to two >0x7F chars, inflating GetByteCount by 2 extra bytes. Also tighten the existing test's comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3ce4a70 to
d12bd8c
Compare
There was a problem hiding this comment.
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
Fix XmlReader ArgumentOutOfRangeException for malformed UTF-8 in XML declaration
When SafeAsciiDecoder is used during initial XML declaration parsing, it maps
each byte to a char 1:1, including bytes >0x7F. UnDecodeChars() previously
used Encoding.GetByteCount() on these chars to compute bytePos, but non-ASCII
chars like U+00BF (from byte 0xBF) produce 2 bytes in UTF-8, inflating bytePos
beyond bytesUsed and causing ArgumentOutOfRangeException in BlockCopy.
Fix: detect SafeAsciiDecoder and use charPos directly (correct 1:1 mapping),
then validate that the target encoding byte count matches. If mismatched,
throw XmlException with Xml_InvalidCharInThisEncoding, detecting invalid bytes
before they corrupt ParsingState.
Fixes #113061