Added fast path to BinaryWriter.Write(string)#37705
Merged
GrabYourPitchforks merged 5 commits intodotnet:masterfrom Jun 10, 2020
Merged
Added fast path to BinaryWriter.Write(string)#37705GrabYourPitchforks merged 5 commits intodotnet:masterfrom
GrabYourPitchforks merged 5 commits intodotnet:masterfrom
Conversation
GrabYourPitchforks
approved these changes
Jun 10, 2020
| // implementation saw some performance regressions, therefore in such scenarios (ASCIIEncoding) | ||
| // work will be delegated to the previous implementation. | ||
| Type encodingType = _encoding.GetType(); | ||
| if (encodingType == typeof(UTF32Encoding) || encodingType == typeof(UTF8Encoding) || |
Member
There was a problem hiding this comment.
There's no need to special case anything other than UTF-8 here. Usage of other Encoding types is vanishingly small as to be not worthwhile.
Contributor
Author
There was a problem hiding this comment.
I've just checked apisof.net for usage of UTF-16 and UTF-32 and at least UTF-16 seems to have a somewhat sizable usage.
src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs
Outdated
Show resolved
Hide resolved
GrabYourPitchforks
approved these changes
Jun 10, 2020
Member
|
Thanks so much! :D |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the code path for
BinaryWriter.Write(string)especially in scenarios where the encoding specified is variable in length. The previous implementation made a blanket assumption that each character within the input string is always the maximum size of any given character in the output encoding; leading to inefficient packing of the byte buffer.Due to this, the previous implementation performs poorly when writing characters that take up less space than the maximum size of a character in an encoding i.e. strings constrained to ASCII with a UTF-8 encoder.
If however an encoding is used, in which the input characters end up filling the byte buffer, the new implementation is quite a bit slower. This is why in such cases this PR delegates to the old implementation to not introduce any performance regressions.
The only thing left is to decide in which specific cases the previous path should be taken. As you can see on the benchmarks (below) the only instances in which regressions appear seem to be pertaining to ASCIIEncoding. A simple predicate could simply whitelist the encodings which do not experience any performance loss, but then the optimisation will not carry over to any custom encodings. Maybe someone has a better idea!
Also fixes: #37158
I ran numerous benchmarks in different scenarios, so the full table of benchmarks can be found here: https://gist.github.com/bbartels/7cc70209a64e01d4fa663546fe035e0a
The Benchmarks were run with a MemoryStream backing the BinaryWriter.
The Benchmarks were also run without the old implementation in place for ASCIIEncoding. If the current PR would be benched, the ASCIIEncoding benchmarks would have equivalent performance.
CharSize = Average char size of output encoding
Size = The Number of input chars
Below are a couple notable excerpts
Improvements:
Regressions: