Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)#123783
Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)#123783
Conversation
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
|
Consider your design. Why is it correct and the simplest form of the solution? Are there other possibilities? I believe you consisdered the following. Why did you pick something different? private static unsafe TChar* MatchChars<TChar>(TChar* p, TChar* pEnd, ReadOnlySpan<TChar> value)
where TChar : unmanaged, IUtfChar<TChar>
{
Debug.Assert((p != null) && (pEnd != null) && (p <= pEnd));
fixed (TChar* stringPointer = &MemoryMarshal.GetReference(value))
{
TChar* str = stringPointer;
if (TChar.CastToUInt32(*str) != '\0')
{
// We only hurt the failure case
// This fix is for French, Kazakh, and Ukrainian cultures. Since a user cannot type 0xA0 or 0x202F as a
// space character we use 0x20 space character instead to mean the same.
// We also need to handle the reverse case where the input has 0xA0 or 0x202F and the format string has 0x20.
while (true)
{
uint cp = (p < pEnd) ? TChar.CastToUInt32(*p) : '\0';
uint val = TChar.CastToUInt32(*str);
if ((cp != val) && !((IsSpaceReplacingChar(val) && (cp == '\u0020')) || (IsSpaceReplacingChar(cp) && (val == '\u0020'))))
{
break;
}
p++;
str++;
if (TChar.CastToUInt32(*str) == '\0')
{
return p;
}
}
}
}
return null;
}
}
``` |
Thank you for the question! I tested your simpler solution (commit 9f5c3ae5) and you're right that it's more elegant. However, both approaches have the same limitation: The core issue:
My complex UTF-8 handling attempted to fix this but introduced bugs with pointer advancement. Your simpler bidirectional check works perfectly for char (UTF-16) but not for UTF-8, which is why the original outer loop test still fails with both approaches. The simpler solution is correct and I've adopted it (removes ~70 lines of buggy code). The remaining UTF-8 issue needs a different approach - likely at the point where |
…-8 handling) Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Done! I've moved the two Ukrainian culture tests into the existing |
…llowTrailingWhite
|
/azp run runtime-libraries-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot is this related failure to the change here? |
... This is the original issue that the PR partially addresses. The failure is in UTF-8 parsing (line 828: |
|
Forgot to check results - yes. It's related.
…________________________________
From: Tarek Mahmoud Sayed ***@***.***>
Sent: Tuesday, February 10, 2026 10:44 AM
To: dotnet/runtime ***@***.***>
Cc: Art Leonard ***@***.***>; Assign ***@***.***>
Subject: Re: [dotnet/runtime] Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional) (PR #123783)
[https://avatars.githubusercontent.com/u/10833894?s=20&v=4]tarekgh left a comment (dotnet/runtime#123783)<#123783 (comment)>
@copilot<https://gh.io/copilot-coding-agent-docs> is this related failure to the change here?
{ "HelixJobId": "bd8a8479-7771-47b4-bfb6-9b29c8662936", "HelixWorkItemName": "System.Runtime.Numerics.Tests" }
Error message
System.FormatException : The value could not be parsed.
Stack trace
at System.Number.ThrowOverflowOrFormatException(ParsingStatus status) in /_/src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs:line 29
at System.Number.ParseBigInteger[TChar](ReadOnlySpan`1 value, NumberStyles style, NumberFormatInfo info) in /_/src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs:line 136
at System.Numerics.BigInteger.Parse(ReadOnlySpan`1 utf8Text, NumberStyles style, IFormatProvider provider) in /_/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigInteger.cs:line 707
at System.Numerics.Tests.parseTest.VerifyParseSpanToString(String num1, NumberStyles ns, Boolean failureNotExpected, String expected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 828
at System.Numerics.Tests.parseTest.VerifyParseToString(String num1, NumberStyles ns, Boolean failureNotExpected, String expected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 884
at System.Numerics.Tests.parseTest.VerifyParseToString(String num1, NumberStyles ns, Boolean failureNotExpected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 818
at System.Numerics.Tests.parseTest.VerifyNumberStyles(NumberStyles ns, Random random) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 689
at System.Numerics.Tests.parseTest.RunParseToStringTests(CultureInfo culture) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 52
at InvokeStub_parseTest.RunParseToStringTests(Object, Span`1)
at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 95
—
Reply to this email directly, view it on GitHub<#123783 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACLGGM2BA66I6ITKWB2ISFD4LIRH7AVCNFSM6AAAAACTMH5XXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNZZHE4TCMBRGQ>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
🤖 Copilot Code Review — PR #123783Holistic AssessmentMotivation: The PR addresses a real test failure with Ukrainian culture (uk-UA) parsing in BigInteger where the test generates numbers with regular spaces but the culture's NumberGroupSeparator is NBSP (U+00A0). The linked issue #120283 confirms this is a genuine problem blocking CI. Approach: The fix normalizes both input characters and format characters to a common space (U+0020) before comparison, making the equivalence bidirectional. This is a reasonable approach that aligns with how Summary: Detailed Findings
|
|
/azp run runtime-libraries-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…irectional) (dotnet#123783) ## Fix for BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional) Fixes dotnet#120283 ### Issue BigInteger parsing fails with cultures that use NBSP (U+00A0) as `NumberGroupSeparator` (like Ukrainian uk-UA) when users type regular space (U+0020). ### Solution Modified the `MatchChars` function in `Number.Parsing.Common.cs` to handle bidirectional space-replacing character equivalence: - Added `NormalizeSpaceReplacingChar` helper that converts NBSP/narrow NBSP to regular space - Simplified comparison: `NormalizeSpaceReplacingChar(cp) != NormalizeSpaceReplacingChar(val)` - This handles both NBSP→space and space→NBSP equivalence for char (UTF-16) parsing - Much more readable than the previous complex boolean expression - Updated comments to reference Ukrainian culture alongside French and Kazakh ### Testing - Added `ParseWithNBSPAsGroupSeparator()` test in existing `parse.cs` file - Test creates a custom culture (en-US base) with NBSP explicitly set as NumberGroupSeparator - Test validates that regular spaces ("1 234 567") are accepted when culture expects NBSP - Also tests trailing space handling - This approach is portable and doesn't depend on Ukrainian culture data which may vary across systems/ICU versions - Tests validate the bidirectional space equivalence fix for char (UTF-16) parsing ✓ ### Impact This fix only impacts **BigInteger** parsing. Other numeric types (Complex, double, float, decimal) use different parsing infrastructure in System.Private.CoreLib that doesn't include the common parsing file. ### Known Limitations This fix addresses char (UTF-16) parsing only. UTF-8 parsing issues are tracked separately in dotnet#124016. ### Progress - [x] Implement bidirectional space equivalence for char (UTF-16) - [x] Add portable tests using custom culture in existing parse.cs file - [x] Simplify design based on code review feedback - [x] Optimize condition ordering for performance - [x] Refactor to helper function for better readability - [x] Make tests portable across different ICU versions - [x] Fix test to actually include spaces to test space equivalence <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/dotnet/runtime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com> Co-authored-by: Stephen Toub <stoub@microsoft.com> Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com> Co-authored-by: tarekgh <10833894+tarekgh@users.noreply.github.com> Co-authored-by: artl <artl@microsoft.com>
Fix for BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)
Fixes #120283
Issue
BigInteger parsing fails with cultures that use NBSP (U+00A0) as
NumberGroupSeparator(like Ukrainian uk-UA) when users type regular space (U+0020).Solution
Modified the
MatchCharsfunction inNumber.Parsing.Common.csto handle bidirectional space-replacing character equivalence:NormalizeSpaceReplacingCharhelper that converts NBSP/narrow NBSP to regular spaceNormalizeSpaceReplacingChar(cp) != NormalizeSpaceReplacingChar(val)Testing
ParseWithNBSPAsGroupSeparator()test in existingparse.csfileImpact
This fix only impacts BigInteger parsing. Other numeric types (Complex, double, float, decimal) use different parsing infrastructure in System.Private.CoreLib that doesn't include the common parsing file.
Known Limitations
This fix addresses char (UTF-16) parsing only. UTF-8 parsing issues are tracked separately in #124016.
Progress
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.