Conversation
|
@neuecc This is quite the write-up, and I'm still processing it. But I'm curious what is in the (very large) payload of the PR? We do have a 3.0 milestone for v3 breaking or large ideas. It seems to me each of your many ideas could be an individual issue in that milestone to better track discussions on each. Then as a whole we can decide when the time is right to start v3 development (which may be now, or last week, since you've evidently already done some coding). Another high level concern I have is that with .NET Core (including .NET 5+), .NET lost the ability to load multiple assemblies with the same name but different assembly versions. At least, not without special AssemblyLoadContexts being created. This was a huge adoption blocker for our v2 release in ASP.NET Core. I'm not sure they can use it even now. A v3 library that includes binary compatibility breaks will similarly confuse and block adoption, and even more this time since .NET Core is that much more popular. I'm open to the idea, but we should be thoughtful about whether we can avoid breaking changes while still enhancing. #57 is one issue that I thought would be best done if formatters could pass along a serialization state object to each other. That would be an interface change, that up to now I didn't think was worth it. But if we make breaking changes anyway, that's one issue to consider in the redesign. |
|
The PR itself has been simplified by removing everything. I wanted to show you the concept first, because if we are going to adopt these (some of them), we need to start the big work from there. I agree with your concern about the compatibility of the assemblies. I see. To be honest, I didn't consider #57 because I didn't feel the need to include it (including its performance implications). |
|
I love the v3 concept (the deserialization from ReadOnlySpan argument)! The Circular Reference serialization (#57) needs a serialization state object (possibly The serialization state object would be a reference type object. ISerializationStateObject and its poolpublic interface ISerializationStateObject : IDisposable
{
void Register(object value, ref MessagePackWriter writer);
bool TryGetKey(object value, out long key);
void Register<T>(ref T value, ref MessagePackWriter writer) where T : struct;
bool TryGetKey<T>(ref T value, out long key) where T : struct;
}
public interface ISerializationStateObjectPool
{
ISerializationStateObject Rent();
void Return(ISerializationStateObject obj);
}
public ref struct MessagePackWriter
{
public ISerializationStateObject? StateObject { get; set; }
}
public static class MessagePackSerializer
{
public static void Serialize<T>(IBufferWriter<byte> bufferWriter, in T value)
{
var writer = new MessagePackWriter(writer);
try
{
formatter.Serialize(ref writer, ref Unsafe.AsRef(value), options);
}
finally
{
if (writer.StateObject is not null)
{
options.StateObjectPool.Return(writer.StateObject);
}
}
}
}
public sealed class ReferenceTypeCircularReferenceFormatter<T> : IMessagePackFormatter<T>
where T : class
{
public void Serialize(ref MessagePackWriter writer, ref T? value, MessagePackOptions options)
{
if (value is null) { /* Write Nil. */ return; }
ISerializerStateObject stateObject = writer.StateObject ??= options.StateObjectPool.Rent();
if (stateObject.TryGetKey(value, out long key))
{
// Serialize the key. Is the representation "Map [1] { key => Nil }" appropriate?
}
else
{
stateObject.Register(value, ref writer);
// Serialize the value. Is the representation "Map [1] { Nil => value }" appropriate?
}
}
}
public sealed class ValueTypeCircularReferenceFormatter<T> : IMessagePackFormatter<T>
where T : struct
{
public void Serialize(ref MessagePackWriter writer, ref T value, MessagePackOptions options)
{
ISerializerStateObject stateObject = writer.StateObject ??= options.StateObjectPool.Rent();
if (stateObject.TryGetKey(ref value, out long key))
{
// Serialize the key. Is the representation "Map [1] { key => Nil }" appropriate?
}
else
{
stateObject.Register(ref value, ref writer);
// Serialize the value. Is the representation "Map [1] { Nil => value }" appropriate?
}
}
} |
Crave this! I need to deserialize from a unmanaged buffer. |
|
@ryancheung this tip may work for you now: #81 (comment) |
|
This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
keep for implementation memo. |
|
It is better to split public interface IMessagePackSerializer<T>
{
/*static abstract*/
void Serialize<TBuffer>(ref MessagePackWriter<TBuffer> writer, ref T value) where TBuffer : IBufferWriter<byte>;
}
public ref struct MessagePackWriter<TBuffer> where TBuffer : IBufferWriter<byte>
{
public TBuffer Buffer;
public MessagePackSerializerOptions Options;
public MessagePackSerializerStateObject State;
public void Dispose();
}
public interface IMessagePackDeserializer<T>
{
/*static abstract*/
bool TryDeserialize(ref MessagePackReader reader, out T? value);
}
public ref struct MessagePackReader
{
public ReadOnlySpan<byte> Span;
public MessagePackSerializerOptions Options;
public MessagePackDeserializerStateObject State;
public void Dispose();
}
public interface IMessagePackAsyncSerializer<T>
{
/*static abstract*/
ValueTask SerializeAsync(MessagePackAsyncWriter writer, T value, CancellationToken token);
}
public interface IMessagePackAsyncDeserializer<T>
{
/*static abstract*/
ValueTask<T?> DeserializeAsync<TEnumerable>(MessagePackAsyncReader<TEnumerable> reader, CancellationToken token) where TEnumerable : IAsyncEnumerable<byte>;
} |
Will this also help enable deserialization with minimal string memory allocation? |
Not at all related, no. But #1285 already delivered what you're asking for, @Phoenix359. |
|
This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
No.3 - |
Proposal of MessagePack v3
@AArnott
It has been a while since the implementation of v2, and I believe there are a few major things that need to be resolved.
MessagePackSerializer.Deserializeshould be acceptReadOnlySpan<byte>(Support for Deserializing fromReadOnlySpan<byte>#81)MessagePackBinary)ref T valuefor large struct(for Serialize) and overwrite(for Deserialize)If we do change it, it will have to be at the same time as v3, as it will be a major interface changes.
I have prepared a minimal implementation and benchmark results as a base for discussion.
(v3 is x2 faster in .NET 5)
1.
MessagePackSerializer.Deserializeshould be acceptReadOnlySpan<byte>I think the lack of acceptance of
ReadOnlySpan<byte>is a major failure of v2.To fix it, we need to change the
MessagePackReader.I think we need to abolish
SequenceReaderand controlReadOnlySpanandReadOnlySequencemanually in MessagePackReader.2. Requires static write/read function(such as v1's
MessagePackBinary)It is a big frustration not to be able to write even a small thing without an
IBufferWriter<byte>.Another drawback is that it cannot be used in an async environment.
We need to API like
System.Buffers.BinaryPrimitives.This will result in the following implementation of
MessagePackWriter.3.
ref T valuefor large struct(for Serialize) and overwrite(for Deserialize)Change the
IMessagePackFormatter<T>interface like the following.The ability to override an existing value with Deserialize has been requested for some time( #107 ) . I also want to use zero-allocation serialization for real-time communication in games, so this feature is definitely important.
I think this can be toggled between new and overwrite by setting the value of Deserialzie to ref, and by using the Overwrite option in MessagePackOptions, for example the following code
We also want to avoid copying when serializing large structs, so let's use ref when serializing as well. The MessagePackSerializer takes values as
in.4. Bufferless StreamingSerializer/StreamingDeserializer
The MessagePack format itself supports streaming read/write, but currently requires a full buffer for both Serialize and Deserialize (Stream APIs, etc. also read into the buffer once).
It is written in Issues several times, but it requires complex code, so we think it is better to provide it as standard.
In reality, it would be better to support only Array/Map, and the unit of reading/writing streaming should be one element at a time, since it would be a performance problem to run a check on every write.
If you assume .NET 6, then only
System.IO.Pipelineswill be supported.Since we think it is important to control the interval between flushes for performance, it is a good idea to prepare a separate Flush strategy option and call FlushAsync accordingly.
As for Deserialize, it will be a 2-pass process, but how about preparing a buffer of 1 unit with TrySkip, and sending it to Reader?
I'd like to reuse the generation of
Writer/Reader, but it's difficult due to the compatibility of async/await and ref struct. I don't have any good ideas.5. Unified code-generation(dynamicassembly/source-generator/standalone code-generator/analyzer)
DynamicAssembly, DynamicMethod, Analyzer, Commandline Code Generator, MSBuild Task, and Source Generator. overlapping and scattered code is very frustrating and a breeding ground for bugs. Duplicate and scattered code is very frustrating and a breeding ground for bugs. I would like to introduce an intermediate abstract metatype to centralize the management. It will also allow users to dynamically create this MetaType and generate custom Formatters.
6. nullable
Unity support needs to continue; LTS does not yet support C# 8.0. We also need to wait for eventual C# 9.0 support for Generic Constraint. It does not work with the current Unity.
In my ZString, ZLogger, and MessagePipe, I handled it by copying from the post-build rewrite. NET side instead of Unity/Client side, so it is easier to handle the source code. NET side, so it's easier to handle the source code. Currently, I'm not able to make everything a source reference, such as copying the T4 stuff, so it should be good to split it up and make everything a copy.
7. .NET 6
using VS2022 preview. record, struct record, new types, etc...
8. CI to GitHub Actions
For OSS, CI with GitHub Actions would be better, and Unity is built as follows
https://github.com/Cysharp/ZString/blob/master/.github/workflows/build-debug.yml
I think it's a good idea to continue to put Canary builds in Azure DevOps. In my other repositories, I have achieved this by doing the following
https://github.com/Cysharp/MagicOnion/blob/master/.github/workflows/build-canary.yml
9. avoid SlowSpan for Unity
In Unity, Span uses Slow Span, and doesn't perform very well (I tested this issue with ZString). If you can use arrays, you can get back better performance by using arrays. Here are the results from a simple benchmark in Unity IL2CPP, arrays are 5 times faster.
for this,
MessagePackWriter/MessagePackReaderholds both array and span.10. more performance improvement
This is a simple benchmark result from the current prototype implementation.
In .NET 5 is twice as fast.
This is probably due in large part to the elimination of the BufferWriter (there are Primitives introduced in 2, and Array/Span switching for 9).
I think it will be much faster in Reader(Deserialize) as well.
Other consider things
IBufferWriter<byte>It is useful but
MessagePackWriter<TBufferWriter>, will affectsIMessagePackFormatter<T, TBufferWriter>, This has a significant impact on code generation, so I reject it.following dotnet/runtime editor.configMy style deviates somewhat from that of
dotnet/runtime. I'm starting to think that I need to start following it instead of my own preferences.ContractlessStandardResolver.Options → StandardResolver.ContractlessOptionsfor the IntelliSensebility.
The APIs of MessagePackSerializer/MessagePackWriter/MessagePackReader/MessagePackSerializerOptions will not be changed.
We believe that this will minimize the impact on general use.