Skip to content

Fix parsing of standard format string#47353

Merged
pgovind merged 13 commits intodotnet:masterfrom
pgovind:numerics_format_bug
Jan 27, 2021
Merged

Fix parsing of standard format string#47353
pgovind merged 13 commits intodotnet:masterfrom
pgovind:numerics_format_bug

Conversation

@pgovind
Copy link

@pgovind pgovind commented Jan 22, 2021

Fixes #46874

Essentially fixes the max 99 digits of precision limitation. New rule: A format modifier with any number of digits is interpreted as a standard format + precision. If they value doesn't fit into an int, we throw.

Affects all numeric types, so the unit tests have been augmented for all the affected types.

Breaking change doc: dotnet/docs#22458

@ghost
Copy link

ghost commented Jan 22, 2021

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #46874

Essentially fixes the max 99 digits of precision limitation. New rule: A format modifier with any number of digits is interpreted as a standard format + precision. If they value doesn't fit into an int, we throw.

Affects all numeric types, so the unit tests have been augmented for all the affected types.

Author: pgovind
Assignees: -
Labels:

area-System.Numerics

Milestone: -

while (i < format.Length && (((uint)format[i] - '0') < 10))
{
n = (n * 10) + format[i++] - '0';
n = checked((n * 10) + format[i++] - '0');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could now be throwing an OverflowException where we weren't before? Should this be translated somewhere into a FormatException?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and decided to leave it as an OverflowException. Do you think it affects usability? The plan is to update the documentation here to state that a precision value > int.Max will throw an OverflowException.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to keep it as FormatException for simplicity, that way users don't need to update even more paths.

We can always do something like:

var tmp = (n * 10)+ format[i++] - '0'

if (tmp < n)
{
    ThrowFormatException();
}

n = tmp;

int temp = ((n * 10) + format[i++] - '0');
if (temp < n)
{
ThrowHelper.ThrowFormatException_BadFormatSpecifier();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look performance critical to warrant ThrowHelper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just done for consistency, to ensure the relevant exception message was included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with what? This file does not use throw helper anywhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with how formatting, in general, handles invalid formats. It doesn't seem ideal to just directly throw new FormatException(SR.Argument_BadFormatSpecifier); when we have an existing throw helper that does exactly this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I do not think this consistency has any value. I think we should be using ThrowHelper only in places where it is measurable performance improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper methods

I agree with having helper methods like BitOperations. They are complex enough that it is definitely worth it to have them in helper methods. At the same time, they are rare enough that the overheads from doing that does not matter. Simple exception throwing does not have this property: It is trivial, canonical C# is to have it inline - not in a helper method, and it is used everywhere so it is worth thinking about the best shape.

Copy link
Member

@stephentoub stephentoub Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be interesting problem to decide what the autogenerated null-checking should look like so that it works well for all situations that we care about. Have any discussions happened on this?

Various implementations have been discussed. I believe the one settled on for at least the initial implementation is translating code like:

static void Foo(string arg1!!, object arg2!!)
{
    ...
}

into code along the lines of:

static void Foo(string arg1, object arg2)
{
    if (arg1 is null) <>ModuleLevelThrowHelper.ThrowArgumentNullException(nameof(arg1));
    if (arg2 is null) <>ModuleLevelThrowHelper.ThrowArgumentNullException(nameof(arg2));
    ...
}
...
static class <>ModuleLevelThrowHelper
{
    [DoesNotReturn]
    internal static void ThrowArgumentNullException(string name) => throw new ArgumentNullException(name);
}

This wouldn't be sufficient for us to remove some of our existing ThrowHelper.ThrowArgumentNullException(ExceptionArgument)-based usage, but it should be fine for us to use almost anywhere we're doing:

if (argument == null)
    throw new ArgumentNullException(nameof(argument));

at the beginning of a method, which amounts to ~1000 places across corlib, and ~5500 places across dotnet/runtime libraries.

Running a quick back-of-the-napkin check over dotnet/runtime libraries:

using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

var r = new Regex("throw new ArgumentNullException", RegexOptions.Compiled);

var results =
    (from lib in Directory.EnumerateDirectories(@"d:\repos\runtime\src\libraries")
     let src = Path.Combine(lib, "src")
     where Directory.Exists(src)
     let count = Directory.EnumerateFiles(src, "*.cs", SearchOption.AllDirectories).Sum(file => r.Matches(File.ReadAllText(file)).Count())
     where count > 0
     select (Lib: Path.GetFileName(lib), Count: count)).ToArray();

foreach (var result in results.OrderByDescending(i => i.Count))
{
    Console.WriteLine($"{result.Count}: {result.Lib}");
}

Console.WriteLine($"Total: {results.Sum(r => r.Count)}");
Console.WriteLine($"Reduction: {results.Sum(r => r.Count - 80):N0}b");

produces:

722: System.Private.CoreLib
359: System.Private.Xml
292: System.Drawing.Common
245: System.DirectoryServices
198: System.Linq.Parallel
158: System.ComponentModel.Composition
152: System.ComponentModel.TypeConverter
142: System.Security.Cryptography.Algorithms
140: Common
140: System.ServiceModel.Syndication
126: Microsoft.Extensions.DependencyInjection.Abstractions
115: System.Private.Xml.Linq
108: System.Threading.Tasks.Parallel
100: Microsoft.Extensions.Http
96: System.CodeDom
84: System.Security.Cryptography.Xml
84: System.Threading.Tasks.Dataflow
80: System.Net.Sockets
78: System.Security.Cryptography.X509Certificates
74: System.IO.FileSystem
70: System.Collections
64: System.Security.Cryptography.Pkcs
61: System.Net.Mail
56: System.Text.Json
50: System.Security.AccessControl
49: System.Composition.Convention
48: System.Security.Cryptography.Csp
47: System.IO.FileSystem.AccessControl
44: System.DirectoryServices.AccountManagement
44: System.Management
44: System.Net.Http
42: Microsoft.Extensions.DependencyModel
42: Microsoft.Extensions.Options
40: System.DirectoryServices.Protocols
39: System.Security.Claims
38: System.IO.Packaging
36: System.Security.Cryptography.Cng
31: System.Runtime.Serialization.Formatters
30: System.Diagnostics.PerformanceCounter
30: System.Reflection.Metadata
28: System.Private.DataContractSerialization
27: System.Net.Primitives
27: System.Text.Encodings.Web
26: System.Collections.Specialized
26: System.Reflection.MetadataLoadContext
25: Microsoft.Extensions.Caching.Memory
25: System.Runtime.Caching
25: System.Security.Principal.Windows
25: System.Transactions.Local
23: System.Text.Encoding.CodePages
22: System.Configuration.ConfigurationManager
22: System.Net.HttpListener
22: System.ObjectModel
21: Microsoft.Extensions.Hosting
21: System.Collections.NonGeneric
20: System.ComponentModel.Composition.Registration
19: System.Collections.Concurrent
19: System.Composition.Hosting
19: System.IO.IsolatedStorage
18: System.Net.Requests
17: System.ComponentModel.Primitives
17: System.Data.Common
16: System.Memory
15: Microsoft.Extensions.Caching.Abstractions
15: System.ComponentModel.Annotations
15: System.IO.Pipes
15: System.Net.Security
15: System.Private.Uri
15: System.Security.Cryptography.Encoding
14: System.Diagnostics.EventLog
14: System.IO.Ports
13: System.Composition.TypedParts
13: System.Net.Http.Json
13: System.Resources.Extensions
13: System.Speech
12: System.Diagnostics.TraceSource
12: System.Net.NameResolution
11: Microsoft.Extensions.Configuration
11: Microsoft.Extensions.Logging.TraceSource
10: System.IO.Compression.ZipFile
10: System.IO.Pipelines
9: System.Diagnostics.DiagnosticSource
9: System.Formats.Asn1
9: System.Linq.Expressions
9: System.Security.Cryptography.Primitives
8: Microsoft.Extensions.Configuration.FileExtensions
8: Microsoft.Extensions.Hosting.Abstractions
8: Microsoft.Extensions.Options.ConfigurationExtensions
7: Microsoft.Extensions.Logging.Abstractions
7: Microsoft.Extensions.Logging.Console
7: Microsoft.Extensions.Logging.EventLog
7: Microsoft.Extensions.Primitives
7: System.IO.Compression
7: System.Reflection.Context
6: Microsoft.Extensions.DependencyInjection
6: Microsoft.Win32.Registry
6: System.Composition.Runtime
6: System.Console
6: System.Net.Connections
6: System.Numerics.Tensors
6: System.Text.RegularExpressions
6: System.Threading.AccessControl
6: System.Web.HttpUtility
5: System.Diagnostics.Process
4: Microsoft.Extensions.Configuration.UserSecrets
4: Microsoft.Extensions.FileSystemGlobbing
4: System.IO.MemoryMappedFiles
4: System.Memory.Data
4: System.Net.Ping
4: System.Net.WebHeaderCollection
4: System.Resources.Writer
4: System.Security.Cryptography.OpenSsl
4: System.ServiceProcess.ServiceController
3: Microsoft.Extensions.Configuration.Abstractions
3: Microsoft.Extensions.Configuration.Binder
3: Microsoft.Extensions.FileProviders.Physical
3: Microsoft.Win32.Registry.AccessControl
3: System.Diagnostics.TextWriterTraceListener
3: System.Net.Http.WinHttpHandler
3: System.Runtime.Numerics
3: System.Windows.Extensions
2: Microsoft.Extensions.Configuration.Ini
2: Microsoft.Extensions.Configuration.Json
2: Microsoft.Extensions.Configuration.Xml
2: Microsoft.Extensions.FileProviders.Composite
2: Microsoft.Extensions.Logging
2: Microsoft.Extensions.Logging.EventSource
2: System.Composition.AttributedModel
2: System.Formats.Cbor
2: System.IO.FileSystem.Watcher
2: System.Net.WebProxy
2: System.Net.WebSockets
2: System.Net.WebSockets.Client
2: System.Security.Cryptography.ProtectedData
2: System.Threading.Channels
1: Microsoft.Bcl.AsyncInterfaces
1: Microsoft.CSharp
1: Microsoft.Extensions.Configuration.CommandLine
1: Microsoft.Extensions.Logging.Configuration
1: Microsoft.Extensions.Logging.Debug
1: Microsoft.IO.Redist
1: Microsoft.XmlSerializer.Generator
1: System.Collections.Immutable
1: System.ComponentModel.EventBasedAsync
1: System.Data.OleDb
1: System.IO.Compression.Brotli
1: System.IO.FileSystem.DriveInfo
1: System.Linq
1: System.Net.Quic
1: System.Net.ServicePoint
1: System.Net.WebClient
1: System.Net.WebSockets.WebSocketProtocol
1: System.Private.Runtime.InteropServices.JavaScript
1: System.Reflection.TypeExtensions
1: System.Runtime.InteropServices.RuntimeInformation
1: System.Security.Permissions
Total: 5414
Reduction: -7,066b

(There are of course other factors not addressed here, like the source in Common being compiled into multiple assemblies.)

So, based on your 80 byte per ThrowHelper estimate and a 1 byte saving per ThrowHelper site, this will cost us 7K across the whole framework. But size isn't why I want this... rather it's to be able to delete between ~5500 and ~22,000 lines of boilerplate code, depending on code style, and not have to write any more such code in the future. From that perspective, as long as it's not a significant regression in size, it's a win in my book. If we can find ways to reduce the size further, bonus (for example, there are 21 libs listed above with just 1 throw new ArgumentNullException... if the compiler were changed to only use a ThrowHelper when there were multiple found, the size regression is much reduced.) And of course, there are other perf benefits to this, such as asm benefits when those arg validation checks are in generic methods that might be stamped multiple times.

Of course, we could also just say the compiler should emit exactly what the dev would have written by hand. Then there's no math, no possibility for regression comparatively, but no win comparatively either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModuleLevelThrowHelper.ThrowArgumentNullException(nameof(arg1))

The downside of this pattern is that the cold string optimization does not kick in for it today.

For throw new ArgumentNullException(nameof(arg1)), the JIT is able to see that "arg1" is on a cold path, avoids materializing the string object upfront, and instead generates code to call a helper to allocate the string lazily. It costs us a bit of extra code size, but it has been a net win. string objects, GC handles to keep them alive, and the hashtable of all interned strings is not free either.

This optimization does not kick in for ThrowArgumentNullException(nameof(arg1)) pattern today. If C# ends up generating this pattern, we should make sure to get the JIT fixed to get the cold string optimization kick in for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaredpar, maybe we should just start with the naive translation/codegen and explore ThrowHelper subsequently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@pgovind pgovind force-pushed the numerics_format_bug branch from bd5fd20 to da3be23 Compare January 25, 2021 18:04
@pgovind
Copy link
Author

pgovind commented Jan 25, 2021

I believe the CI failures are unrelated here. @Anipik, can this still make it into P1?

@tannergooding
Copy link
Member

CC. @jeffhandley on the above, since the goal was to get this in for P1

@jeffhandley
Copy link
Member

@Anipik I'm assuming the snap occurred on schedule earlier today and that if we want this included in Preview 1, we'd need to bring this to Tactics for consideration.

@Anipik
Copy link
Contributor

Anipik commented Jan 26, 2021

@Anipik I'm assuming the snap occurred on schedule earlier today and that if we want this included in Preview 1, we'd need to bring this to Tactics for consideration.

We have not finalized the policy for preview1 changes yet, but during 5.0 early previews it was M2 approval for first few days.
cc @danmosemsft

@Anipik
Copy link
Contributor

Anipik commented Jan 26, 2021

we can use backport feature to directly port this to p1

@jeffhandley
Copy link
Member

I chatted with @terrajobst about this. Despite the potentially-breaking nature of this and wanting to get it out to folks ASAP, we decided to just wait for Preview 2.

@pgovind
Copy link
Author

pgovind commented Jan 27, 2021

Merging this after checking in with @jeffhandley and @tannergooding. Thanks for the reviews everyone!

@pgovind pgovind merged commit ad7a50e into dotnet:master Jan 27, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update the parsing/formatting logic to allow more than 99 digits

8 participants