Address misc feedback and issues from recent perf changes by steveharter · Pull Request #41414 · dotnet/corefx

steveharter · 2019-09-27T21:50:08Z

Address late feedback from previous perf PR #41098 and other misc perf-related deserialization changes.

Results in deserialization perf increase of ~1% - ~2% for simple object.

src/System.Text.Json/tests/Serialization/ReadValueTests.cs

src/System.Text.Json/src/System/Text/Json/Serialization/WriteStackFrame.cs

ahsonkhan · 2019-09-28T04:37:48Z

src/System.Text.Json/src/System/Text/Json/Serialization/WriteStackFrame.cs

            {
-                int len = JsonClassInfo.PropertyCacheArray.Length;
-                if (PropertyEnumeratorIndex < len)
+                if (PropertyEnumeratorIndex > len)


I am not fully understanding this logic. Can you explain what we are trying to do here?

The extension property is always the last property in PropertyCacheArray so this is checking to see if the extension property is writing or whether it is finished.

The extension property which is different than other properties because although it is a dictionary property we want to serialize each element in the dictionary as a property of the POCO instead of a dictionary element thus we need some extra state to track whether it is being serialized.

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

ahsonkhan · 2019-09-28T04:55:16Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

+            if (maxBytes > options.DefaultBufferSize)
            {
                // Get the actual byte count in order to handle large input.
                maxBytes = JsonReaderHelper.GetUtf8ByteCount(json.AsSpan());


nit: Given this throws for certain large string lengths (when the return count won't fit int.MaxValue), should this be documented as an exception on the method?

If so we should also document every other Deserialize method as well. Perhaps create a doc issue?

Do the other deserialize methods have this transcoding-based exception though? If so, then yes, we should. Will create an issue on https://github.com/dotnet/dotnet-api-docs.

...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonPropertyInfo.cs

src/System.Text.Json/tests/Serialization/Value.ReadTests.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

stephentoub · 2019-10-04T02:10:46Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

+            // higher than the threshold which is options.DefaultBufferSize.
+            Span<byte> utf8 = json.Length <= (ArrayPoolMaxSizeBeforeUsingNormalAlloc / JsonConstants.MaxExpansionFactorWhileTranscoding) ?
+                tempArray = ArrayPool<byte>.Shared.Rent(json.Length * JsonConstants.MaxExpansionFactorWhileTranscoding) :
+                new byte[JsonReaderHelper.GetUtf8ByteCount(json.AsSpan())];


Why not just always use Rent? If it's too large for the pool, it'll just allocate it itself.

From the previous commit (which removed the comment):
https://github.com/dotnet/corefx/pull/41414/files/a8cd02e62d6bd2d4686ff86124fdc31e233ca42d..218e1c0d306dde4441927fdcecacd071295a145c

// and because we can avoid calling Clear().

And from above:

// For performance, avoid obtaining the actual byte count unless the memory usage may be
// higher than the threshold

avoid obtaining the actual byte count unless the memory usage may be higher than the threshold

Computing an accurate byte count doesn't prevent you from still using ArrayPool.

and because we can avoid calling Clear()

The benefit analysis of clearing all of these arrays is still not obvious to me.

Computing an accurate byte count doesn't prevent you from still using ArrayPool.

I believe the intention here is to avoid doing the extra work to get an accurate byte count when we are using the arraypool anyway and save that cost (asking for 4k from the pool vs 12k isn't that much different, so asking for the worst case is good enough). However, when allocating a regular array, over-asking by up to 3x is probably too expensive, so asking for the exact amount needed by incurring the cost of getting the exact count is worth the trade-off.

With your suggestion, we'd always have to always get the accurate byte count, regardless of whether the pool was allocating itself (because it got exhausted/we need over 1 MB), or not. Maybe that approach is good enough but there are certainly some trade-offs here. @steveharter, what are your thoughts on this?

Yes there are two things going on here:

Avoid asking for a transcoded byte count when a threshold is surpassed

The threshold selected happens to be the same as ArrayPool's current implementation where it just does a normal alloc, so a call to Clear() can be avoided

Given that, yes the call to Clear() is insignificant so we could just use ArrayPool always.

src/System.Text.Json/tests/Serialization/ReadValueTests.cs

ahsonkhan · 2019-10-07T22:53:50Z

It is hard to review just the latest changes. Can you avoid force pushing and keep the individual commits going forward (otherwise, I end up re-reviewing the exact same lines of code)?

src/System.Text.Json/src/System/Text/Json/Serialization/WriteStackFrame.cs

ahsonkhan

Other than the threshold typo in tests and some test nits/questions, looks good.

steveharter · 2019-10-08T19:59:43Z

Test failures not related:

System.Data.OleDb.Tests
System.Diagnostics.EventLog.Tests

…efx#41414) Commit migrated from dotnet/corefx@8655ef9

Address misc feedback and issues from recent perf changes

7adf0d4

steveharter added tenet-performance Performance related issue area-System.Text.Json labels Sep 27, 2019

steveharter added this to the 5.0 milestone Sep 27, 2019

steveharter requested review from ahsonkhan and layomia September 27, 2019 21:50

steveharter self-assigned this Sep 27, 2019