Fix the max token size threshold to correctly compute to 125MB for Base64 bytes. by ahsonkhan · Pull Request #40792 · dotnet/corefx

ahsonkhan · 2019-09-04T01:00:30Z

Fixes https://github.com/dotnet/corefx/issues/40755 in master

Also:
Rename constant to fix transpose error: Base46 -> Base64

cc @lauxjpn, @steveharter, @scalablecory

Base64 bytes.

scalablecory

looks good other than a flaky test.

scalablecory · 2019-09-04T01:18:10Z

src/System.Text.Json/tests/Utf8JsonWriterTests.cs

+            }
+            catch (OutOfMemoryException)
+            {
+                return;


This is flaky, no?

We have this pattern elsewhere in the code base, but you bring up a good point. We can exclude this test on linux. It should be deterministic for other OSes.

stephentoub · 2019-09-04T02:05:58Z

src/System.Text.Json/src/System/Text/Json/JsonConstants.cs

@@ -63,7 +63,7 @@ internal static class JsonConstants

        public const int MaxEscapedTokenSize = 1_000_000_000;   // Max size for already escaped value.


Where does this value come from?

We compute for the maximum required space to write a property name and value string, and then ask for that size'd buffer from the IBufferWriter<byte> (the call to GetSpan(sizeHint)).

What's the largest key/value pairs that can fit after making a single call to GetSpan(...)?
Say, you want to write: "really large property name": " really large value". You do this as follows:

// This needs to call IBW.GetSpan with the required size to write this data into it jsonWriter.WriteString("really large property name", " really large value");

The maximum feasible is roughly: int.MaxValue - 56 (for array-backed buffer)
Then we have up to 1000 depth, so account for indentation (and quotes, spaces, colon, etc), so let's round the maximum feasible down to 2 billion.
We have 1 billion for the property name string, and 1 billion for the value string. That's where this limit comes from.

Sure, but why not just ask the IBufferWriter for what's needed, and let it fail if it can't provide the space?

Does our read implementation have some sanity limits that we're trying to keep our write implementation under?

No, the reader doesn't have such limitations. The limitation in the writer was mainly to avoid making multiple calls to the interface while writing for perf (for the common path).

stephentoub · 2019-09-04T02:07:44Z

src/System.Text.Json/src/System/Text/Json/JsonConstants.cs

        public const int MaxEscapedTokenSize = 1_000_000_000;   // Max size for already escaped value.
        public const int MaxUnescapedTokenSize = MaxEscapedTokenSize / MaxExpansionFactorWhileEscaping;  // 166_666_666 bytes
-        public const int MaxBase46ValueTokenSize = (MaxEscapedTokenSize >> 2 * 3) / MaxExpansionFactorWhileEscaping;  // 125_000_000 bytes
+        public const int MaxBase64ValueTokenSize = (MaxEscapedTokenSize >> 2) * 3 / MaxExpansionFactorWhileEscaping;  // 125_000_000 bytes


What forces us to have a hardcoded maximum? Why not just let the implementation OOM if the required size would be too big?

It may not OOM, it may cause integer overflow depending on the size when we calculate the maximum required size for the given input. And then, we may pass in an unexpected value to IBW.GetSpan(...). This could be a user-defined IBW which could have custom logic for how the memory is allocated.

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Bytes.cs

Lines 50 to 63 in a6e7ffd

private void WriteBase64Minimized(ReadOnlySpan<byte> bytes)

{

int encodingLength = Base64.GetMaxEncodedToUtf8Length(bytes.Length);

Debug.Assert(encodingLength < int.MaxValue - 3);

// 2 quotes to surround the base-64 encoded string value.

// Optionally, 1 list separator

int maxRequired = encodingLength + 3;

if (_memory.Length - BytesPending < maxRequired)

{

Grow(maxRequired);

}

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs

Lines 1047 to 1055 in a6e7ffd

else

{

Debug.Assert(_output != null);

_output.Advance(BytesPending);

BytesCommitted += BytesPending;

BytesPending = 0;

_memory = _output.GetMemory(sizeHint);

For example. if this calculation overflows (which currently is guaranteed not to), then we skip the call to Grow and then overwrite existing data, or only write partial data, or throw ArgumentException when trying to copy to a buffer that's too small:

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Bytes.cs

Lines 90 to 95 in a6e7ffd

int maxRequired = indent + encodingLength + 3 + s_newLineLength;

if (_memory.Length - BytesPending < maxRequired)

{

Grow(maxRequired);

}

it may cause integer overflow

And you can't use checked?

Hmm, let me see if that is feasible, and follow up on this as a separate PR.

This PR is intended to fix the bug in the current code that was blocking user scenario (considering for 3.0), so I'd like to keep the change isolated to just that.

And you can't use checked?

Aren't their performance implications of doing so? Or are you suggesting those would be negligible here?

sharplab.io

Aren't their performance implications of doing so? Or are you suggesting those would be negligible here?

It would add a few instructions, but you'd also get to remove calls like JsonWriterHelper.ValidateBytes(bytes);, simplify the code, and avoid these strange hard-coded, semi-accurate limits that appear to come out of nowhere.

Good point. I am trying to see the side effects of removing these checks (and constants). Currently, the error is detected up-front with a deterministic error message. Removing the checks and letting integer overflow/OOM/etc. capture the failure condition based on the input and method being called may not be worth the trade-off. It is easy to reason about these limits today.

For example, let's say we remove the escaping/transcoding based limits. Today, we have the following up-front check on the public API:

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs

Lines 454 to 456 in 06007cd

public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value)

{

JsonWriterHelper.ValidatePropertyAndValue(propertyName, value);

corefx/src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.cs

Lines 117 to 121 in 06007cd

public static void ValidatePropertyAndValue(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value)

{

if (propertyName.Length > JsonConstants.MaxCharacterTokenSize || value.Length > JsonConstants.MaxCharacterTokenSize)

ThrowHelper.ThrowArgumentException(propertyName, value);

}

After that, all subsequent method calls can rely on this invariant.

Now, we remove that, and then the "error" state (i.e. data too large) propagates to all the places where the previous assertions break:

GetMaxEscapedLength would need to be checked.

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs

Line 1010 in 06007cd

int length = JsonWriterHelper.GetMaxEscapedLength(value.Length, firstEscapeIndexVal);

Calculating the required max size would need to be checked for both indented and minimized.

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs

Lines 1477 to 1482 in 06007cd

Debug.Assert(escapedValue.Length <= JsonConstants.MaxEscapedTokenSize);

Debug.Assert(escapedPropertyName.Length < ((int.MaxValue - 7 - indent - s_newLineLength) / JsonConstants.MaxExpansionFactorWhileTranscoding) - escapedValue.Length);

// All ASCII, 2 quotes for property name, 2 quotes for value, 1 colon, and 1 space => escapedPropertyName.Length + escapedValue.Length + 6

// Optionally, 1 list separator, 1-2 bytes for new line, and up to 3x growth when transcoding

int maxRequired = indent + ((escapedPropertyName.Length + escapedValue.Length) * JsonConstants.MaxExpansionFactorWhileTranscoding) + 7 + s_newLineLength;

corefx/src/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs

Lines 1329 to 1336 in 06007cd

private void WriteStringMinimized(ReadOnlySpan<char> escapedPropertyName, ReadOnlySpan<char> escapedValue)

{

Debug.Assert(escapedValue.Length <= JsonConstants.MaxUnescapedTokenSize);

Debug.Assert(escapedPropertyName.Length < ((int.MaxValue - 6) / JsonConstants.MaxExpansionFactorWhileTranscoding) - escapedValue.Length);

// All ASCII, 2 quotes for property name, 2 quotes for value, and 1 colon => escapedPropertyName.Length + escapedValue.Length + 5

// Optionally, 1 list separator, and up to 3x growth when transcoding

int maxRequired = ((escapedPropertyName.Length + escapedValue.Length) * JsonConstants.MaxExpansionFactorWhileTranscoding) + 6;

And now the user gets IntegerOverflowException rather than an ArgumentException.

That said, I think it's worth doing the exercise and see what code paths would need to be fixed and if reasoning about the kinds of errors that might get surfaced to the caller is easy enough to justify the change.

The main question is: Do we want to surface OOMs or IntegerOverflowException to the caller if they pass data that's too large (depending on what code path got executed), or do we want to provide a more meaningful, and deterministic error message (with ArgumentException) that conveys what went wrong.

Edit: Filed an issue - https://github.com/dotnet/corefx/issues/40795

checked would require catching the overflow exception. It's a forbidden exception type that should not bubble out to callers.

He was referring to the overflow exception, not the out of memory exception. (And it's fine for an overflow exception to escape to callers in some APIs, you just wouldn't expect it here.)

This PR is intended to fix the bug in the current code that was blocking user scenario (considering for 3.0), so I'd like to keep the change isolated to just that.

Yes we need to keep this as simple as possible for 3.0 to avoid any potential regression. For 5.0 then we improve this area by removing the constants and use checked (if that turns out to be the best design).

platform specific new line.

ahsonkhan · 2019-09-04T03:31:50Z

Any other feedback for this PR (specific to fixing the issue blocking writing large byte[] as Base64) or is it good to merge?

I'll re-evaluate the feedback around removing the thresholds all together.

ahsonkhan · 2019-09-04T03:33:48Z

Unrelated test failure for the Windows Build UWP_CoreCLR_x64_Debug leg: https://github.com/dotnet/corefx/issues/31608

…se64 bytes. (dotnet#40792) * Fix the max token size threshold to correctly compute to 125MB for Base64 bytes. * Rename constant to fix transpose error: Base46 -> Base64 * Enable the outerloop tests for windows and osx only and update to use platform specific new line.

…se64 bytes. (#40792) (#40796) * Fix the max token size threshold to correctly compute to 125MB for Base64 bytes. * Rename constant to fix transpose error: Base46 -> Base64 * Enable the outerloop tests for windows and osx only and update to use platform specific new line.

…se64 bytes. (dotnet/corefx#40792) * Fix the max token size threshold to correctly compute to 125MB for Base64 bytes. * Rename constant to fix transpose error: Base46 -> Base64 * Enable the outerloop tests for windows and osx only and update to use platform specific new line. Commit migrated from dotnet/corefx@1511f72

ahsonkhan added 2 commits September 3, 2019 17:56

Fix the max token size threshold to correctly compute to 125MB for

2a42239

Base64 bytes.

Rename constant to fix transpose error: Base46 -> Base64

c365950

ahsonkhan added the area-System.Text.Json label Sep 4, 2019

ahsonkhan added this to the 5.0 milestone Sep 4, 2019

scalablecory approved these changes Sep 4, 2019

View reviewed changes

stephentoub reviewed Sep 4, 2019

View reviewed changes

Enable the outerloop tests for windows and osx only and update to use

1558cb7

platform specific new line.

ahsonkhan merged commit 1511f72 into dotnet:master Sep 4, 2019

ahsonkhan deleted the FixWritingBase64Threshold branch September 4, 2019 05:24

ahsonkhan mentioned this pull request Sep 4, 2019

[release/3.0] Fix the max token size threshold to correctly compute to 125MB for Base64 bytes. #40796

Merged

ahsonkhan mentioned this pull request Feb 1, 2020

Consider removing hard-coded limits for token sizes while writing JSON using Utf8JsonWriter dotnet/runtime#30755

Closed

		@@ -63,7 +63,7 @@ internal static class JsonConstants

		public const int MaxEscapedTokenSize = 1_000_000_000; // Max size for already escaped value.

	private void WriteBase64Minimized(ReadOnlySpan<byte> bytes)
	{
	int encodingLength = Base64.GetMaxEncodedToUtf8Length(bytes.Length);

	Debug.Assert(encodingLength < int.MaxValue - 3);

	// 2 quotes to surround the base-64 encoded string value.
	// Optionally, 1 list separator
	int maxRequired = encodingLength + 3;

	if (_memory.Length - BytesPending < maxRequired)
	{
	Grow(maxRequired);
	}

	else
	{
	Debug.Assert(_output != null);

	_output.Advance(BytesPending);
	BytesCommitted += BytesPending;
	BytesPending = 0;

	_memory = _output.GetMemory(sizeHint);

	int maxRequired = indent + encodingLength + 3 + s_newLineLength;

	if (_memory.Length - BytesPending < maxRequired)
	{
	Grow(maxRequired);
	}

	public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value)
	{
	JsonWriterHelper.ValidatePropertyAndValue(propertyName, value);

	public static void ValidatePropertyAndValue(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value)
	{
	if (propertyName.Length > JsonConstants.MaxCharacterTokenSize \|\| value.Length > JsonConstants.MaxCharacterTokenSize)
	ThrowHelper.ThrowArgumentException(propertyName, value);
	}

	Debug.Assert(escapedValue.Length <= JsonConstants.MaxEscapedTokenSize);
	Debug.Assert(escapedPropertyName.Length < ((int.MaxValue - 7 - indent - s_newLineLength) / JsonConstants.MaxExpansionFactorWhileTranscoding) - escapedValue.Length);

	// All ASCII, 2 quotes for property name, 2 quotes for value, 1 colon, and 1 space => escapedPropertyName.Length + escapedValue.Length + 6
	// Optionally, 1 list separator, 1-2 bytes for new line, and up to 3x growth when transcoding
	int maxRequired = indent + ((escapedPropertyName.Length + escapedValue.Length) * JsonConstants.MaxExpansionFactorWhileTranscoding) + 7 + s_newLineLength;

	private void WriteStringMinimized(ReadOnlySpan<char> escapedPropertyName, ReadOnlySpan<char> escapedValue)
	{
	Debug.Assert(escapedValue.Length <= JsonConstants.MaxUnescapedTokenSize);
	Debug.Assert(escapedPropertyName.Length < ((int.MaxValue - 6) / JsonConstants.MaxExpansionFactorWhileTranscoding) - escapedValue.Length);

	// All ASCII, 2 quotes for property name, 2 quotes for value, and 1 colon => escapedPropertyName.Length + escapedValue.Length + 5
	// Optionally, 1 list separator, and up to 3x growth when transcoding
	int maxRequired = ((escapedPropertyName.Length + escapedValue.Length) * JsonConstants.MaxExpansionFactorWhileTranscoding) + 6;

Conversation

ahsonkhan commented Sep 4, 2019

Uh oh!

scalablecory left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan commented Sep 4, 2019

Uh oh!

ahsonkhan commented Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ahsonkhan Sep 4, 2019 •

edited

Loading

ahsonkhan Sep 4, 2019 •

edited

Loading

ahsonkhan Sep 4, 2019 •

edited

Loading

ahsonkhan Sep 4, 2019 •

edited

Loading

stephentoub Sep 4, 2019 •

edited

Loading

ahsonkhan commented Sep 4, 2019 •

edited

Loading