Utf16.IsValid and Utf8/16.IndexOfInvalidSubsequence#120326
Utf16.IsValid and Utf8/16.IndexOfInvalidSubsequence#120326lilinus wants to merge 20 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds new UTF-16 validation APIs to System.Text.Unicode, introducing Utf16.IsValid and Utf16.IndexOfInvalidSubsequence methods alongside similar functionality for UTF-8. It also refactors existing code to use these new standardized validation APIs instead of custom validation logic.
- Adds new
Utf16class with validation methodsIsValidandIndexOfInvalidSubsequence - Adds
Utf8.IndexOfInvalidSubsequencemethod to complement existingUtf8.IsValid - Refactors multiple codebases to use the new standardized validation APIs
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| System.Private.CoreLib.Shared.projitems | Adds the new Utf16.cs file to the build |
| Utf16.cs | New class implementing UTF-16 validation methods |
| Utf8.cs | Adds IndexOfInvalidSubsequence method and updates class documentation |
| System.Runtime.cs | Adds public API surface for new UTF-16 and UTF-8 validation methods |
| HttpEncoder.cs | Refactors surrogate validation to use new Utf16.IndexOfInvalidSubsequence |
| Normalization.Icu.cs | Replaces custom validation logic with Utf16.IsValid |
| StringSearchValues.cs | Simplifies surrogate validation using Utf16.IsValid |
| Utf8UtilityTests.ValidateBytes.cs | Adds test coverage for new Utf8.IndexOfInvalidSubsequence method |
| Utf16UtilityTests.ValidateChars.cs | Adds test coverage for new Utf16 validation methods |
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
|
Tagging subscribers to this area: @dotnet/area-system-runtime |
|
@lilinus Thanks for your patience on this; we are planning to review it this month. I've updated the branch and CI will re-run, but there was an ApiCompat error in the build previously indicating |
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
Update XML doc comments and remove unneeded using statements. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
...tem.Runtime/tests/System.Runtime.Tests/System/Text/Unicode/Utf8UtilityTests.ValidateBytes.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16Utility.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I'm getting some compat errors in Microsoft.Bcl.Memory. Not sure how to approach those unfortunately. Should the new methods be excluded or included in that package? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8.cs:830
- The XML docs for
Utf8.IsValidimmediately below this new API still describe the input as a "ReadOnlySpan string". Now thatIndexOfInvalidSubsequenceis added, please updateUtf8.IsValid's<param name="value">documentation to describe validating a UTF-8 byte sequence / input text, for consistency and correctness.
Utf8Utility.GetIndexOfFirstInvalidUtf8Sequence(value, out _);
#endif
/// <summary>
/// Validates that the value is well-formed UTF-8.
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf16.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8.cs:834
- The XML docs for Utf8.IsValid still describe the input parameter as a generic "ReadOnlySpan string", but this API validates a UTF-8 byte sequence (ReadOnlySpan). Please update the text so public docs accurately describe the expected input, ideally matching the wording used for IndexOfInvalidSubsequence just above.
/// <summary>
/// Validates that the value is well-formed UTF-8.
/// </summary>
/// <param name="value">The <see cref="ReadOnlySpan{T}"/> string.</param>
/// <returns><c>true</c> if value is well-formed UTF-8, <c>false</c> otherwise.</returns>
public static bool IsValid(ReadOnlySpan<byte> value) =>
Fixes #118018
Fixes #118113