Serializer perf improvements by steveharter · Pull Request #41238 · dotnet/corefx

steveharter · 2019-09-20T18:46:57Z

Misc changes to gain ~7% on deserialize and ~3% on serialize for a simple, flat object (as in other recent PRs, the MicroBenchmarks.Serializers.Json_FromString<Location> and MicroBenchmarks.Serializers.Json_ToString<Location> were used.

There is also a gain for deserializing collections (~6%). There is no gain for serializing collections.

There is also a larger gain for deserializing small payloads which are under 85 bytes (255 / 3) since allocations are stack-based instead of pool-based.

Changes include:

Avoid the byte[] alloc when deserializing strings (the buffer for transcoding string to byte[]). Now a pool or stack alloc is made.
Removing several parameters due to exception helper changes. Added support to set an internal AppendPathInformation property to true which will cause the try\catch logic to add the path automatically. This avoids having to pass the reader and\or "stack state" to lower-level methods which may need to throw an exception that needs path information as part of its message.
Remove a temporary struct assignment WriteStackFrame current = state.Current that caused a copy-by-value.
Avoid verifying internal converters in release build for correctness (reading too much or too little).
Improve the "IsProcessing" logic by using bit flags which helps when comparing more than one ClassType.
Improve SkipProperty() by avoiding the direct comparison to the "missing property" singleton.
Misc other one-line changes

Deserialize Before

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	2.554 us	0.0091 us	0.0081 us	2.553 us	2.540 us	2.567 us	0.4771	-	-	3056 B
SystemTextJson	1.465 us	0.0088 us	0.0078 us	1.464 us	1.450 us	1.477 us	0.1053	-	-	680 B

Deserialize After

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	2.565 us	0.0151 us	0.0134 us	2.564 us	2.541 us	2.590 us	0.4824	-	-	3056 B
SystemTextJson	1.358 us	0.0097 us	0.0081 us	1.357 us	1.341 us	1.375 us	0.0701	-	-	448 B

Serialize Before

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	1.438 us	0.0106 us	0.0099 us	1.438 us	1.414 us	1.450 us	0.2751	-	-	1736 B
SystemTextJson	1.011 us	0.0084 us	0.0079 us	1.007 us	1.000 us	1.024 us	0.0891	-	-	584 B

Serialize After

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	1,424.7 ns	7.607 ns	7.115 ns	1,423.7 ns	1,412.6 ns	1,439.2 ns	0.2746	-	-	1736 B
SystemTextJson	970.9 ns	7.285 ns	6.458 ns	968.5 ns	964.4 ns	982.3 ns	0.0928	-	-	584 B

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.cs

ahsonkhan

In general, I am wondering if the code churn/complexity in this PR justifies the performance gains. We might be making the wrong trade-off here around maintainability and fragility. Can we distill this change down to just the biggest perf wins and leave smaller refactorings/changes to a subsequent PR (that will be for 5.0 only)?

ahsonkhan · 2019-09-20T20:02:46Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonPropertyInfo.cs

                propertyInfo.ReadEnumerable(tokenType, ref state, ref reader);
            }
+            // For performance on release build, don't verify converter correctness for internal converters.
+            else if (HasInternalConverter)


Is this special-casing/complexity worth the perf gain? I don't know if we should treat the internal converters this way. We are saving a couple of branches and introducing 1. If this change doesn't have a big difference, I would revert it.

I ran the deserialize benchmark 4 times over each of the following factoring and picked the fastest time (the Min column) in the bechmark. Doing that seems to produce the most accurate result. Here's the data:

Current code in PR: 1331us

Previous code: 1377us

Current code + AggressiveInlining on VerifyRead: 1361ns

so (1361-1331) / 1361 = ~2.2% faster.

I propose to keep the changes in the PR since ~2% is significant and because it affects every property read (or write).

Doing that seems to produce the most accurate result

@adamsitnik - is that the approach you suggest we follow?

I would have though that a big value-prop of using BDN was that the "Median"/actual result is what we would use for perf comparison and the error column quantifies the variation. If the variation/error is ~2%, then I wouldn't take the results you shared as a strong indication of a perf improvement. However, I don't think the error is that high at all (from the results you shared). Why not use the "Median" column?

I propose to keep the changes in the PR since ~2% is significant and because it affects every property read (or write).

Fair enough. Sounds good to me. Maybe there is a way to restructure the code into helper methods so that the #if DEBUG is isolated and we have less code duplication.

Note the profiler also shows ~2% for OnRead() which is mostly composed of calling VerifyRead() and its overhead of gathering parameters.

Using "Median" on my local machine is OK but doesn't give me a best-case "non-interference" value that I need for very small gains that are close to the margin of error. I may have apps open including VS which will be doing background compilation etc. A small burp shifts the median. Also by using "Min" a few times I can see if there was random interference and I should run again or close some offending apps (such as a browser or VS).

is that the approach you suggest we follow?

I would recommend comparing the entire distribution. Performance is never a single value. "Statistics for Performance Engineers" from Pro .NET Benchmarking contains a great answer to this question.

When it comes to getting super stable results for validating 2% gain you can tell BDN to affinitize the benchmark process to a given CPU. It works great for single-threaded benchmarks.

--affinity $value

$value is an integer representing the bit mask send to OS API so for example if you send 8 it's translated to 1000 which affinitizes the benchmark to 4th Processor

btw you can also tell it to run the benchmark processes many times

--launchCount $times

Is this special-casing/complexity worth the perf gain?

BTW @sebastienros is going to have a modern BCL talk on 1st on November when he is going to show us how to use ASP.NET Perf lab to run TechEmpower benchmarks using local builds of CoreFX to validate if the change is worth it from TechEmpower scenario perspective. I personally use it anytime I am not sure if a small gain in microbenchmarks is worth the extra complexity.

src/System.Text.Json/src/System/Text/Json/Serialization/JsonPropertyInfo.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandleArray.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs

ahsonkhan · 2019-09-20T20:28:25Z

src/System.Text.Json/src/System/Text/Json/ThrowHelper.Serialization.cs

+                message = $"{SR.Format(SR.DeserializeUnableToConvertValue, propertyType)}";
+            }
+
+            if (message.Contains(EmbedPathInfoFlag))


This seems like a lot of ceremony/complexity. Is it really worth it? I also don't know if its a good idea to embed canaries like $Path in the exception message (within the Strings resource), especially when it could be affected by localization.

Another option is to add an internal (for now) bool flag AppendPathInformation to JsonException. This wouldn't have an issue with localization (plus we don't localize currently).

I'll make this change in the next commit.

src/System.Text.Json/src/System/Text/Json/Serialization/ClassType.cs

ahsonkhan · 2019-09-20T20:30:20Z

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs


-        public bool IsCollectionForClass => IsEnumerable || IsDictionary || IsIDictionaryConstructible;
-        public bool IsCollectionForProperty => IsEnumerableProperty || IsDictionaryProperty || IsIDictionaryConstructibleProperty;
+        public bool IsProcessingCollectionForProperty()


A lot of these checks are doing redundant work (at least where they are called), so I am wondering if a cleaner solution would be to make those checks less redundant rather than turning the enums into a bit flag (or maybe we do both).

I'm open for other ways to do this, but using bit flags was the fastest and cleanest solution I came up with.

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

ahsonkhan · 2019-09-20T20:40:26Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

+
+            try
+            {
+                int actualByteCount = JsonReaderHelper.GetUtf8FromText(json, utf8);


If we are going to use these helpers outside of the Utf8JsonReader, let's move it to the JsonHelpers class (which contains helpers used across multiple components) - this clean up can be/should be done separately to avoid unnecessary code churn for a PR that is meant for 3.1.

src/System.Text.Json/src/System/Text/Json/ThrowHelper.Serialization.cs

gfoidl

Nits.

src/System.Text.Json/src/System/Text/Json/Serialization/JsonPropertyInfo.cs

src/System.Text.Json/tests/Serialization/Value.ReadTests.cs

ahsonkhan

Otherwise, LGTM.

ahsonkhan · 2019-09-25T00:56:13Z

src/System.Text.Json/src/Resources/Strings.resx

    <value>The attribute '{0}' cannot exist more than once on '{1}'.</value>
  </data>
  <data name="SerializeUnableToSerialize" xml:space="preserve">
-    <value>The object or value could not be serialized. Path: {0}.</value>


Why are we removing the path info here?

It is automatically added now.

ahsonkhan · 2019-09-25T01:06:26Z

...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs

                    state.Current.JsonPropertyInfo = state.Current.JsonClassInfo.PolicyProperty;
                }

-                Debug.Assert(


Why remove this Debug.Assert? Is it no longer valid?

Still valid but the code just above it have the if statement that does the same thing.

ahsonkhan · 2019-09-25T01:07:02Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs

+        // The maximum length of a byte array before we ask for the actual transcoded byte size.
+        // The "int.MaxValue / 2" is used because max byte[] length is a bit less than int.MaxValue
+        // and because we don't want to allocate such a large buffer if not necessary.
+        private const int MaxArrayLengthBeforeCalculatingSize = int.MaxValue / 2 / JsonConstants.MaxExpansionFactorWhileTranscoding;


Why are we dividing by 2 first?

The reason for dividing by 2 is explained. Are you suggesting it is after the / 3? (same result of /6).

We could divide by 4 or some other number instead of 2. It's a trade-off: asking for the transcoded size is CPU expensive, but allocating unnecessary memory can be wasteful (but not so much if coming from a pool).

I was still confused by why we divide by 2. Re-reading the comment is starting to make some sense, but let me verify:
The "int.MaxValue / 2" is used because max byte[] length is a bit less than int.MaxValue

We could divide by 4 or some other number instead of 2. It's a trade-off: asking for the transcoded size is CPU expensive, but allocating unnecessary memory can be wasteful (but not so much if coming from a pool).

Ah yes, now that makes sense. So, the division by 2 is just an arbitrary value then, correct? So, rather than say 2 billion, we chose 1 billion because we might as well get the actual byte count once we are at above 1 billion. It could have been 500 million (or 250 million), as well.

In that case, maybe we should set the threshold of 1 million since after that the array pool allocates an array anyway so its probably not worth it going above that. In fact, maybe this change should only apply for lengths < 1 million, and for anything above that we keep using the old code-path.

ahsonkhan · 2019-09-25T01:40:19Z

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs

        }

        public bool SkipProperty => Drain ||
-            ReferenceEquals(JsonPropertyInfo, JsonPropertyInfo.s_missingProperty) ||


So, we changed this condition. Can you explain why?

JsonPropertyInfo.s_missingProperty now has its ShouldDeserialize property set to false

src/System.Text.Json/src/System/Text/Json/ThrowHelper.Serialization.cs

ahsonkhan · 2019-09-25T01:45:41Z

src/System.Text.Json/src/System/Text/Json/ThrowHelper.Serialization.cs

+                ex.AppendPathInformation = true;
+            }
+
+            if (ex.AppendPathInformation)


Question: Why do we do it this way, rather than always appending the path information (like we had before)?

It mirrors the other code for ReadStack which allows a caller to set AppendPathInformation=true (although currently no code does that here for WriteStack).

ahsonkhan · 2019-09-25T01:47:20Z

src/System.Text.Json/tests/Serialization/Value.ReadTests.cs

+            string json = $"\"{repeated}\"";
+            Assert.Equal(length, json.Length);
+
+            string str = JsonSerializer.Deserialize<string>(json);


nit: Can we add calls to other overloads of Deserialize for this test? Like the one that takes ROS<byte> as UTF-8 text input?

This is testing a very specific boundary case \ code branch for string only (not byte).

Are you concerned about large input?

Are you concerned about large input?

Yes, and primarily the async overload that reads from a stream.

...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs

ahsonkhan · 2019-09-25T03:41:37Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonPropertyInfo.cs

+
+                OnRead(ref state, ref reader);
+
+#if DEBUG


nit: The HasInternalConverter code paths is introducing code duplication. Extract into a helper/avoid the duplication.

It is not possible to avoid code duplication here since the DEBUG statements are only for HasInternalConverter==true. Even if it was possible to avoid code duplication this avoids an extra function call which may not be inlined (code is on a hot path).

Commit migrated from dotnet/corefx@527595f

steveharter added tenet-performance Performance related issue area-System.Text.Json labels Sep 20, 2019

steveharter added this to the 5.0 milestone Sep 20, 2019

steveharter requested review from ahsonkhan and layomia September 20, 2019 18:46

steveharter self-assigned this Sep 20, 2019

steveharter commented Sep 20, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.cs Show resolved Hide resolved

ahsonkhan reviewed Sep 20, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.String.cs Outdated Show resolved Hide resolved

ahsonkhan reviewed Sep 20, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/ThrowHelper.Serialization.cs Show resolved Hide resolved

gfoidl reviewed Sep 21, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/JsonPropertyInfo.cs Outdated Show resolved Hide resolved

src/System.Text.Json/tests/Serialization/Value.ReadTests.cs Outdated Show resolved Hide resolved

steveharter added 3 commits September 24, 2019 14:24

Serializer perf improvements

9e26e94

Review feedback; change exception $Path to AppendPathInformation

9d96810

Netfx build issue

32a73af

steveharter force-pushed the Perf2 branch from 255db25 to 32a73af Compare September 24, 2019 19:24

Netfx build issue

26d64f0

ahsonkhan approved these changes Sep 25, 2019

View reviewed changes

ahsonkhan reviewed Sep 25, 2019

View reviewed changes

...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs Show resolved Hide resolved

ahsonkhan reviewed Sep 25, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs Show resolved Hide resolved

ahsonkhan reviewed Sep 25, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/ReadStackFrame.cs Show resolved Hide resolved

ahsonkhan reviewed Sep 25, 2019

View reviewed changes

steveharter merged commit 527595f into dotnet:master Sep 25, 2019

steveharter deleted the Perf2 branch September 25, 2019 15:46

steveharter added a commit to steveharter/dotnet_corefx that referenced this pull request Oct 14, 2019

Serializer perf improvements (dotnet#41238)

f669ae0

steveharter added a commit to steveharter/dotnet_corefx that referenced this pull request Oct 14, 2019

Serializer perf improvements (dotnet#41238)

9bf90cc

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Serializer perf improvements (dotnet/corefx#41238)

13fea30

Commit migrated from dotnet/corefx@527595f

Conversation

steveharter commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ahsonkhan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahsonkhan Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gfoidl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ahsonkhan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

steveharter commented Sep 20, 2019 •

edited

Loading

ahsonkhan left a comment •

edited

Loading

ahsonkhan Sep 25, 2019 •

edited

Loading

ahsonkhan Sep 20, 2019 •

edited

Loading