Rationale
For sorting purposes it's common to need portions of strings containing numbers to be treated like numbers. Consider the list of strings "Windows 7", "Windows 10".
Using the Ordinal StringComparer to sort the list one would get
but the desired ascending logical sort would be
Proposed API
namespace System {
public class StringComparer {
+ public static StringComparer Create(CultureInfo culture, CompareOptions options);
}
}
namespace System.Globalization {
public enum CompareOptions {
+ NumericOrdering = 0x00000020
}
}
Usage
var list = new List<string> { "Windows 10", "Windows 7" };
list.Sort(StringComparer.Logical); // List is now "Windows 7", "Windows 10"
This would also be good for sorting strings containing IP addresses.
Details
Logical is a convenience property equivalent to the result of Create(CultureInfo.CurrentCulture, CompareOptions.Logical)
LogicalIgnoreCase is a convenience property equivalent to the result of Create(CultureInfo.CurrentCulture, CompareOptions.Logical | CompareOptions.IgnoreCase)
- Non-numeric sequences will be evaluated with the culture provided.
- Numeric sequences will be determined by the result of
Char.IsDigit.
- All UTF-16 digits will be supported and are manually parsed using
Char.GetNumericValue.
- Only positive integral values without digit separators will be supported directly.
- Numbers will be treated as
ulongs. Logic for overflows will have to be considered.
- The string
Windows 8.1 would be considered 4 sequences. The Windows would be a string sequence, the 8 would be a numeric sequence, the . would be another string sequence, and the 1 would be another numeric sequence.
- This API could later be expanded to include support for allowing signs, decimals, and digit separators through the use of overloads accepting a
NumberStyles parameter.
- When a numeric and string sequence are considered at the same time the numeric sequence always comes before the string sequence so when sorting the following list,
"a", "7" the number 7 will be sorted before the letter a.
- Existing methods that take a
CompareOptions parameter as input will need to be updated to support the new Logical member.
Open Questions
- Should
CompareOptions.Logical be implemented as the flag option SORT_DIGITSASNUMBERS to the dwCmpFlags parameter of CompareStringEx? Using it's implementation should be more efficient but later expanding support for NumberStyles will require a re-implementation with matching behavior.
Updates
- Added
Logical and LogicalIgnoreCase properties.
- Added support for all UTF-16 digits.
- Added more
CreateLogical overloads to match the Create method.
- Added retrieval of the
NumberFormatInfo from the StringComparer parameter when not explicitly provided and is a CultureAwareComparer.
- Removed
CreateLogical overloads that matched the Create method.
- Switched to only supporting positive integral values without digit separators.
- Added consideration of comparing a numeric sequence with a string sequence.
- Added the flag member
CompareOptions.Logical and changed CreateLogical to be just an overload of Create.
Rationale
For sorting purposes it's common to need portions of strings containing numbers to be treated like numbers. Consider the list of strings
"Windows 7", "Windows 10".Using the
OrdinalStringComparerto sort the list one would getbut the desired ascending logical sort would be
Proposed API
namespace System { public class StringComparer { + public static StringComparer Create(CultureInfo culture, CompareOptions options); } } namespace System.Globalization { public enum CompareOptions { + NumericOrdering = 0x00000020 } }Usage
This would also be good for sorting strings containing IP addresses.
Details
Logicalis a convenience property equivalent to the result ofCreate(CultureInfo.CurrentCulture, CompareOptions.Logical)LogicalIgnoreCaseis a convenience property equivalent to the result ofCreate(CultureInfo.CurrentCulture, CompareOptions.Logical | CompareOptions.IgnoreCase)Char.IsDigit.Char.GetNumericValue.ulongs. Logic for overflows will have to be considered.Windows 8.1would be considered 4 sequences. TheWindowswould be a string sequence, the8would be a numeric sequence, the.would be another string sequence, and the1would be another numeric sequence.NumberStylesparameter."a", "7"the number7will be sorted before the lettera.CompareOptionsparameter as input will need to be updated to support the newLogicalmember.Open Questions
CompareOptions.Logicalbe implemented as the flag optionSORT_DIGITSASNUMBERSto thedwCmpFlagsparameter ofCompareStringEx? Using it's implementation should be more efficient but later expanding support forNumberStyleswill require a re-implementation with matching behavior.Updates
LogicalandLogicalIgnoreCaseproperties.CreateLogicaloverloads to match theCreatemethod.NumberFormatInfofrom theStringComparerparameter when not explicitly provided and is aCultureAwareComparer.CreateLogicaloverloads that matched theCreatemethod.CompareOptions.Logicaland changedCreateLogicalto be just an overload ofCreate.