Skip to content

Added fast path to BinaryWriter.Write(string)#37705

Merged
GrabYourPitchforks merged 5 commits intodotnet:masterfrom
bbartels:BinaryWriterString
Jun 10, 2020
Merged

Added fast path to BinaryWriter.Write(string)#37705
GrabYourPitchforks merged 5 commits intodotnet:masterfrom
bbartels:BinaryWriterString

Conversation

@bbartels
Copy link
Contributor

@bbartels bbartels commented Jun 10, 2020

This PR improves the code path for BinaryWriter.Write(string) especially in scenarios where the encoding specified is variable in length. The previous implementation made a blanket assumption that each character within the input string is always the maximum size of any given character in the output encoding; leading to inefficient packing of the byte buffer.

Due to this, the previous implementation performs poorly when writing characters that take up less space than the maximum size of a character in an encoding i.e. strings constrained to ASCII with a UTF-8 encoder.

If however an encoding is used, in which the input characters end up filling the byte buffer, the new implementation is quite a bit slower. This is why in such cases this PR delegates to the old implementation to not introduce any performance regressions.

The only thing left is to decide in which specific cases the previous path should be taken. As you can see on the benchmarks (below) the only instances in which regressions appear seem to be pertaining to ASCIIEncoding. A simple predicate could simply whitelist the encodings which do not experience any performance loss, but then the optimisation will not carry over to any custom encodings. Maybe someone has a better idea!

Also fixes: #37158

I ran numerous benchmarks in different scenarios, so the full table of benchmarks can be found here: https://gist.github.com/bbartels/7cc70209a64e01d4fa663546fe035e0a

The Benchmarks were run with a MemoryStream backing the BinaryWriter.
The Benchmarks were also run without the old implementation in place for ASCIIEncoding. If the current PR would be benched, the ASCIIEncoding benchmarks would have equivalent performance.

CharSize = Average char size of output encoding
Size = The Number of input chars

Below are a couple notable excerpts

Improvements:

Method Impl CharSize Size Encoding Mean Ratio
Write Old 1 Byte 500 UTF-8 545.4 ns 1.00
Write New 1 Byte 500 UTF-8 350.5 ns 0.64
Write Old 1 Byte 10000 UTF-8 7,022.4 ns 1.00
Write New 1 Byte 10000 UTF-8 3,706.2 ns 0.53
Write Old 1.1 Byte 500 UTF-8 754.0 ns 1.00
Write New 1.1 Byte 500 UTF-8 628.5 ns 0.83
Write Old 1.1 Byte 10000 UTF-8 10,878.3 ns 1.00
Write New 1.1 Byte 10000 UTF-8 8,280.9 ns 0.76
Write Old 4 Byte 1000000 UTF-16 4,926.7 ns 1.00
Write New 4 Byte 1000000 UTF-16 4,005.1 ns 0.81

Regressions:

Method Impl CharSize Size Encoding Mean Ratio
Write Old 1 Byte 10000 ASCII 2,557.0 ns 1.00
Write New 1 Byte 10000 ASCII 3,178.9 ns 1.24
Write Old 1 Byte 1000000 ASCII 240,853.0 ns 1.00
Write New 1 Byte 1000000 ASCII 299,498.3 ns 1.24

// implementation saw some performance regressions, therefore in such scenarios (ASCIIEncoding)
// work will be delegated to the previous implementation.
Type encodingType = _encoding.GetType();
if (encodingType == typeof(UTF32Encoding) || encodingType == typeof(UTF8Encoding) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to special case anything other than UTF-8 here. Usage of other Encoding types is vanishingly small as to be not worthwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted in 4999bc2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked apisof.net for usage of UTF-16 and UTF-32 and at least UTF-16 seems to have a somewhat sizable usage.

https://apisof.net/catalog/System.Text.UnicodeEncoding

@GrabYourPitchforks
Copy link
Member

Thanks so much! :D

@GrabYourPitchforks GrabYourPitchforks merged commit c215a3b into dotnet:master Jun 10, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect comment on BinaryWriter.Write(string)

3 participants