Add a fast path to IndexOfQuoteOrAnyControlOrBackSlash#126700
Add a fast path to IndexOfQuoteOrAnyControlOrBackSlash#126700EgorBo wants to merge 9 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-json |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Pull request overview
Adds a SIMD fast-path to JsonReaderHelper.IndexOfQuoteOrAnyControlOrBackSlash to quickly detect " / \ / control characters by scanning the first 16 bytes before falling back to SearchValues<byte>-based searching.
Changes:
- Introduces a
Vector128<byte>-based first-16-bytes scan for quote/backslash/control bytes. - Adds an ARM64-specific mask extraction path using
AdvSimdplusBitOperations.TrailingZeroCount. - Moves the existing
IndexOfAny(SearchValues<byte>)implementation into a non-inlined fallback helper.
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Outdated
Show resolved
Hide resolved
…nReaderHelper.net8.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@EgorBot -linux_azure_arm -arm -linux_aws_arm -profiler using System.Text;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);
[MemoryDiagnoser]
public class Benchmarks
{
// ── TokenSerialization fields ────────────────────────────────────────────
private List<object> _tokenObjects;
[ThreadStatic] static Utf8JsonWriter t_writer;
[ThreadStatic] static MemoryStream t_stream;
[GlobalSetup]
public void Setup()
{
// TokenSerialization
_tokenObjects = new List<object>(200);
for (int i = 0; i < 200; i++)
{
if (i % 3 == 0)
_tokenObjects.Add(GenerateRecordJson(1));
else
_tokenObjects.Add(new Dictionary<string, object>
{
["seq"] = i,
["label"] = $"item_{i}",
["blob"] = new byte[100]
});
}
}
private static string GenerateRecordJson(int targetSizeKb = 150)
{
var sb = new StringBuilder(targetSizeKb * 1024 + 512);
sb.Append("{");
sb.Append("\"TypeName\":\"product\",");
sb.Append("\"CategoryCode\":1,");
sb.Append("\"Label\":\"Product\",");
sb.Append("\"IsAction\":false,");
sb.Append("\"IsActionMember\":false,");
sb.Append("\"IsTrackingEnabled\":true,");
sb.Append("\"IsAvailableLocal\":true,");
sb.Append("\"IsChildRecord\":false,");
sb.Append("\"IsLinksEnabled\":true,");
sb.Append("\"IsCustomRecord\":false,");
sb.Append("\"PrimaryKeyField\":\"productid\",");
sb.Append("\"PrimaryLabelField\":\"title\",");
sb.Append("\"Fields\":[");
int targetBytes = targetSizeKb * 1024;
int fieldIndex = 0;
bool firstField = true;
while (sb.Length < targetBytes - 512)
{
if (!firstField) sb.Append(",");
firstField = false;
sb.Append("{");
sb.Append($"\"TypeName\":\"field_{fieldIndex}\",");
sb.Append($"\"InternalName\":\"Field_{fieldIndex}\",");
sb.Append($"\"FieldType\":\"String\",");
sb.Append($"\"Label\":\"Field {fieldIndex}\",");
sb.Append($"\"MaxSize\":100,");
sb.Append($"\"IsReadable\":true,");
sb.Append($"\"IsCreatable\":true,");
sb.Append($"\"IsUpdatable\":true,");
sb.Append($"\"IsTrackingEnabled\":false,");
sb.Append($"\"IsPrimaryKey\":false,");
sb.Append($"\"IsVirtual\":false,");
sb.Append($"\"Requirement\":\"None\"");
sb.Append("}");
fieldIndex++;
}
sb.Append("]");
sb.Append("}");
return sb.ToString();
}
[Benchmark]
public void TokenSerialization()
{
var stream = t_stream ??= new MemoryStream(64 * 1024);
stream.Position = 0;
stream.SetLength(0);
var writer = t_writer;
if (writer == null)
{
writer = new Utf8JsonWriter(stream, new JsonWriterOptions { SkipValidation = true });
t_writer = writer;
}
else
writer.Reset(stream);
writer.WriteStartObject();
writer.WriteStartArray("Catalog");
foreach (var token in _tokenObjects)
{
if (token is string strToken)
{
if (!string.IsNullOrEmpty(strToken))
writer.WriteRawValue(strToken);
}
else if (token is Dictionary<string, object> dictToken)
{
writer.WriteStartObject();
foreach (var kvp in dictToken)
{
writer.WritePropertyName(kvp.Key);
JsonSerializer.Serialize(writer, kvp.Value);
}
writer.WriteEndObject();
}
}
writer.WriteEndArray();
writer.WriteEndObject();
writer.Flush();
if (stream.Length == 0) throw new Exception("unreachable");
}
} |
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Show resolved
Hide resolved
|
@tannergooding I decided to implement the idea we discussed yesterday, I think your PR makes sense to check in too. I couldn't detect more improvements from extending 16 bytes to 32 bytes so decided to keep as is. The fallback doesn't show up in the traces. |
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Outdated
Show resolved
Hide resolved
…nReaderHelper.net8.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.net8.cs
Outdated
Show resolved
Hide resolved
Removed experimental SVE code for finding index of quote or control characters.
|
@tannergooding do we need anything else here? I am cooking a PR to intrinsify |
|
This LGTM to me as a fast path optimization. We can cleanup more as JIT optimizations come online and if we improve SearchValues so that it can be used directly after the appropriate slice occurs. We probably want weigh-in from @eiriktsarpalis as well to ensure that 16 is the "right" size and not just a size that's optimal for this particular first party scenario (also cc. @jeffhandley as an fyi). |
Validate the theory we came up with @tannergooding that we mostly find
"character within first 16 bytes in this function (and the span is most of the time is bigger than 16 bytes) - e.g. the end of a property nameThis doesn't replace #126678, just special cases for JSON where we indeed can assume something is usually found early.
Benchmark - 10-13% improvement on Cobalt100
arm64 codegen: