Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Serializer perf improvements#41238

Merged
steveharter merged 4 commits intodotnet:masterfrom
steveharter:Perf2
Sep 25, 2019
Merged

Serializer perf improvements#41238
steveharter merged 4 commits intodotnet:masterfrom
steveharter:Perf2

Conversation

@steveharter
Copy link
Contributor

@steveharter steveharter commented Sep 20, 2019

Misc changes to gain ~7% on deserialize and ~3% on serialize for a simple, flat object (as in other recent PRs, the MicroBenchmarks.Serializers.Json_FromString<Location> and MicroBenchmarks.Serializers.Json_ToString<Location> were used.

There is also a gain for deserializing collections (~6%). There is no gain for serializing collections.

There is also a larger gain for deserializing small payloads which are under 85 bytes (255 / 3) since allocations are stack-based instead of pool-based.

Changes include:

  • Avoid the byte[] alloc when deserializing strings (the buffer for transcoding string to byte[]). Now a pool or stack alloc is made.
  • Removing several parameters due to exception helper changes. Added support to set an internal AppendPathInformation property to true which will cause the try\catch logic to add the path automatically. This avoids having to pass the reader and\or "stack state" to lower-level methods which may need to throw an exception that needs path information as part of its message.
  • Remove a temporary struct assignment WriteStackFrame current = state.Current that caused a copy-by-value.
  • Avoid verifying internal converters in release build for correctness (reading too much or too little).
  • Improve the "IsProcessing" logic by using bit flags which helps when comparing more than one ClassType.
  • Improve SkipProperty() by avoiding the direct comparison to the "missing property" singleton.
  • Misc other one-line changes

Deserialize Before

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 2.554 us 0.0091 us 0.0081 us 2.553 us 2.540 us 2.567 us 0.4771 - - 3056 B
SystemTextJson 1.465 us 0.0088 us 0.0078 us 1.464 us 1.450 us 1.477 us 0.1053 - - 680 B

Deserialize After

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 2.565 us 0.0151 us 0.0134 us 2.564 us 2.541 us 2.590 us 0.4824 - - 3056 B
SystemTextJson 1.358 us 0.0097 us 0.0081 us 1.357 us 1.341 us 1.375 us 0.0701 - - 448 B

Serialize Before

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 1.438 us 0.0106 us 0.0099 us 1.438 us 1.414 us 1.450 us 0.2751 - - 1736 B
SystemTextJson 1.011 us 0.0084 us 0.0079 us 1.007 us 1.000 us 1.024 us 0.0891 - - 584 B

Serialize After

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 1,424.7 ns 7.607 ns 7.115 ns 1,423.7 ns 1,412.6 ns 1,439.2 ns 0.2746 - - 1736 B
SystemTextJson 970.9 ns 7.285 ns 6.458 ns 968.5 ns 964.4 ns 982.3 ns 0.0928 - - 584 B

@steveharter steveharter added this to the 5.0 milestone Sep 20, 2019
@steveharter steveharter self-assigned this Sep 20, 2019
Copy link

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am wondering if the code churn/complexity in this PR justifies the performance gains. We might be making the wrong trade-off here around maintainability and fragility. Can we distill this change down to just the biggest perf wins and leave smaller refactorings/changes to a subsequent PR (that will be for 5.0 only)?

propertyInfo.ReadEnumerable(tokenType, ref state, ref reader);
}
// For performance on release build, don't verify converter correctness for internal converters.
else if (HasInternalConverter)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this special-casing/complexity worth the perf gain? I don't know if we should treat the internal converters this way. We are saving a couple of branches and introducing 1. If this change doesn't have a big difference, I would revert it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the deserialize benchmark 4 times over each of the following factoring and picked the fastest time (the Min column) in the bechmark. Doing that seems to produce the most accurate result. Here's the data:

  • Current code in PR: 1331us
  • Previous code: 1377us
  • Current code + AggressiveInlining on VerifyRead: 1361ns

so (1361-1331) / 1361 = ~2.2% faster.

I propose to keep the changes in the PR since ~2% is significant and because it affects every property read (or write).

Copy link

@ahsonkhan ahsonkhan Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that seems to produce the most accurate result

@adamsitnik - is that the approach you suggest we follow?

I would have though that a big value-prop of using BDN was that the "Median"/actual result is what we would use for perf comparison and the error column quantifies the variation. If the variation/error is ~2%, then I wouldn't take the results you shared as a strong indication of a perf improvement. However, I don't think the error is that high at all (from the results you shared). Why not use the "Median" column?

I propose to keep the changes in the PR since ~2% is significant and because it affects every property read (or write).

Fair enough. Sounds good to me. Maybe there is a way to restructure the code into helper methods so that the #if DEBUG is isolated and we have less code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the profiler also shows ~2% for OnRead() which is mostly composed of calling VerifyRead() and its overhead of gathering parameters.

Using "Median" on my local machine is OK but doesn't give me a best-case "non-interference" value that I need for very small gains that are close to the margin of error. I may have apps open including VS which will be doing background compilation etc. A small burp shifts the median. Also by using "Min" a few times I can see if there was random interference and I should run again or close some offending apps (such as a browser or VS).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the approach you suggest we follow?

I would recommend comparing the entire distribution. Performance is never a single value. "Statistics for Performance Engineers" from Pro .NET Benchmarking contains a great answer to this question.

When it comes to getting super stable results for validating 2% gain you can tell BDN to affinitize the benchmark process to a given CPU. It works great for single-threaded benchmarks.

--affinity $value

$value is an integer representing the bit mask send to OS API so for example if you send 8 it's translated to 1000 which affinitizes the benchmark to 4th Processor

btw you can also tell it to run the benchmark processes many times

--launchCount $times

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this special-casing/complexity worth the perf gain?

BTW @sebastienros is going to have a modern BCL talk on 1st on November when he is going to show us how to use ASP.NET Perf lab to run TechEmpower benchmarks using local builds of CoreFX to validate if the change is worth it from TechEmpower scenario perspective. I personally use it anytime I am not sure if a small gain in microbenchmarks is worth the extra complexity.

message = $"{SR.Format(SR.DeserializeUnableToConvertValue, propertyType)}";
}

if (message.Contains(EmbedPathInfoFlag))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of ceremony/complexity. Is it really worth it? I also don't know if its a good idea to embed canaries like $Path in the exception message (within the Strings resource), especially when it could be affected by localization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to add an internal (for now) bool flag AppendPathInformation to JsonException. This wouldn't have an issue with localization (plus we don't localize currently).

I'll make this change in the next commit.


public bool IsCollectionForClass => IsEnumerable || IsDictionary || IsIDictionaryConstructible;
public bool IsCollectionForProperty => IsEnumerableProperty || IsDictionaryProperty || IsIDictionaryConstructibleProperty;
public bool IsProcessingCollectionForProperty()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these checks are doing redundant work (at least where they are called), so I am wondering if a cleaner solution would be to make those checks less redundant rather than turning the enums into a bit flag (or maybe we do both).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open for other ways to do this, but using bit flags was the fastest and cleanest solution I came up with.


try
{
int actualByteCount = JsonReaderHelper.GetUtf8FromText(json, utf8);
Copy link

@ahsonkhan ahsonkhan Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use these helpers outside of the Utf8JsonReader, let's move it to the JsonHelpers class (which contains helpers used across multiple components) - this clean up can be/should be done separately to avoid unnecessary code churn for a PR that is meant for 3.1.

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits.

Copy link

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

<value>The attribute '{0}' cannot exist more than once on '{1}'.</value>
</data>
<data name="SerializeUnableToSerialize" xml:space="preserve">
<value>The object or value could not be serialized. Path: {0}.</value>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing the path info here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is automatically added now.

state.Current.JsonPropertyInfo = state.Current.JsonClassInfo.PolicyProperty;
}

Debug.Assert(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this Debug.Assert? Is it no longer valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still valid but the code just above it have the if statement that does the same thing.

// The maximum length of a byte array before we ask for the actual transcoded byte size.
// The "int.MaxValue / 2" is used because max byte[] length is a bit less than int.MaxValue
// and because we don't want to allocate such a large buffer if not necessary.
private const int MaxArrayLengthBeforeCalculatingSize = int.MaxValue / 2 / JsonConstants.MaxExpansionFactorWhileTranscoding;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dividing by 2 first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for dividing by 2 is explained. Are you suggesting it is after the / 3? (same result of /6).

We could divide by 4 or some other number instead of 2. It's a trade-off: asking for the transcoded size is CPU expensive, but allocating unnecessary memory can be wasteful (but not so much if coming from a pool).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was still confused by why we divide by 2. Re-reading the comment is starting to make some sense, but let me verify:
The "int.MaxValue / 2" is used because max byte[] length is a bit less than int.MaxValue

We could divide by 4 or some other number instead of 2. It's a trade-off: asking for the transcoded size is CPU expensive, but allocating unnecessary memory can be wasteful (but not so much if coming from a pool).

Ah yes, now that makes sense. So, the division by 2 is just an arbitrary value then, correct? So, rather than say 2 billion, we chose 1 billion because we might as well get the actual byte count once we are at above 1 billion. It could have been 500 million (or 250 million), as well.

In that case, maybe we should set the threshold of 1 million since after that the array pool allocates an array anyway so its probably not worth it going above that. In fact, maybe this change should only apply for lengths < 1 million, and for anything above that we keep using the old code-path.

}

public bool SkipProperty => Drain ||
ReferenceEquals(JsonPropertyInfo, JsonPropertyInfo.s_missingProperty) ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we changed this condition. Can you explain why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonPropertyInfo.s_missingProperty now has its ShouldDeserialize property set to false

ex.AppendPathInformation = true;
}

if (ex.AppendPathInformation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why do we do it this way, rather than always appending the path information (like we had before)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mirrors the other code for ReadStack which allows a caller to set AppendPathInformation=true (although currently no code does that here for WriteStack).

string json = $"\"{repeated}\"";
Assert.Equal(length, json.Length);

string str = JsonSerializer.Deserialize<string>(json);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we add calls to other overloads of Deserialize for this test? Like the one that takes ROS<byte> as UTF-8 text input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing a very specific boundary case \ code branch for string only (not byte).

Are you concerned about large input?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you concerned about large input?

Yes, and primarily the async overload that reads from a stream.


OnRead(ref state, ref reader);

#if DEBUG

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The HasInternalConverter code paths is introducing code duplication. Extract into a helper/avoid the duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible to avoid code duplication here since the DEBUG statements are only for HasInternalConverter==true. Even if it was possible to avoid code duplication this avoids an extra function call which may not be inlined (code is on a hot path).

@steveharter steveharter merged commit 527595f into dotnet:master Sep 25, 2019
@steveharter steveharter deleted the Perf2 branch September 25, 2019 15:46
steveharter added a commit to steveharter/dotnet_corefx that referenced this pull request Oct 14, 2019
steveharter added a commit to steveharter/dotnet_corefx that referenced this pull request Oct 14, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants