Skip to content

Proposal of MessagePack v3#1272

Closed
neuecc wants to merge 4 commits intomasterfrom
v3-proposal
Closed

Proposal of MessagePack v3#1272
neuecc wants to merge 4 commits intomasterfrom
v3-proposal

Conversation

@neuecc
Copy link
Member

@neuecc neuecc commented Jun 21, 2021

Proposal of MessagePack v3

@AArnott

It has been a while since the implementation of v2, and I believe there are a few major things that need to be resolved.

  1. MessagePackSerializer.Deserialize should be accept ReadOnlySpan<byte> (Support for Deserializing from ReadOnlySpan<byte> #81)
  2. Requires static write/read function(such as v1's MessagePackBinary)
  3. ref T value for large struct(for Serialize) and overwrite(for Deserialize)
  4. Bufferless StreamingSerializer/StreamingDeserializer
  5. Unified code-generation(dynamicassembly/source-generator/standalone code-generator/analyzer)
  6. nullable
  7. .NET 6
  8. CI to GitHub Actions
  9. avoid SlowSpan for Unity
  10. more performance improvement

If we do change it, it will have to be at the same time as v3, as it will be a major interface changes.
I have prepared a minimal implementation and benchmark results as a base for discussion.

image

(v3 is x2 faster in .NET 5)

1. MessagePackSerializer.Deserialize should be accept ReadOnlySpan<byte>

I think the lack of acceptance of ReadOnlySpan<byte> is a major failure of v2.
To fix it, we need to change the MessagePackReader.
I think we need to abolish SequenceReader and control ReadOnlySpan and ReadOnlySequence manually in MessagePackReader.

2. Requires static write/read function(such as v1's MessagePackBinary)

It is a big frustration not to be able to write even a small thing without an IBufferWriter<byte>.
Another drawback is that it cannot be used in an async environment.

We need to API like System.Buffers.BinaryPrimitives.

public static partial class MessagePackPrimitives
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool TryWriteInt32(Span<byte> buffer, int value)
    {
        if (buffer.Length < 5) return false;

        unchecked
        {
            buffer[4] = (byte)value;
            buffer[3] = (byte)(value >> 8);
            buffer[2] = (byte)(value >> 16);
            buffer[1] = (byte)(value >> 24);
            buffer[0] = MessagePackCode.Int32;
        }

        return true;
    }

This will result in the following implementation of MessagePackWriter.

// MessagePackWriter
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void WriteInt32(int value)
{
    if (!MessagePackPrimitives.TryWriteInt32(WritableSpan, value))
    {
        Ensure(5);
        MessagePackPrimitives.WriteInt32(WritableSpan, value);
    }
}

3. ref T value for large struct(for Serialize) and overwrite(for Deserialize)

Change the IMessagePackFormatter<T> interface like the following.

public interface IMessagePackFormatter<T>
{
    void Serialize(ref MessagePackWriter writer, ref T? value, MessagePackOptions options);
    void Deserialize(ref MessagePackReader reader, ref T? value, MessagePackOptions options);
}

The ability to override an existing value with Deserialize has been requested for some time( #107 ) . I also want to use zero-allocation serialization for real-time communication in games, so this feature is definitely important.

I think this can be toggled between new and overwrite by setting the value of Deserialzie to ref, and by using the Overwrite option in MessagePackOptions, for example the following code

// generated serializer code
void Deserialize(ref MessagePackWriter writer, ref T? value, MessagePackOptions options);
{
    if (!writer.Overwrite) // Overwrite option will provide from MessagePackOptions
    {
        value = new T();
    }
    value.Foo = Read(); 
}

We also want to avoid copying when serializing large structs, so let's use ref when serializing as well. The MessagePackSerializer takes values as in.

public static class MessagePackSerializer
{
    public static void Serialize<T>(IBufferWriter<byte> bufferWriter, in T value)
    {
        var writer = new MessagePackWriter(writer);
        formatter.Serialize(ref writer, ref Unsafe.AsRef(value), options);
    }
}

4. Bufferless StreamingSerializer/StreamingDeserializer

The MessagePack format itself supports streaming read/write, but currently requires a full buffer for both Serialize and Deserialize (Stream APIs, etc. also read into the buffer once).

It is written in Issues several times, but it requires complex code, so we think it is better to provide it as standard.

In reality, it would be better to support only Array/Map, and the unit of reading/writing streaming should be one element at a time, since it would be a performance problem to run a check on every write.

If you assume .NET 6, then only System.IO.Pipelines will be supported.

public static class MessagePackStreamingSerializer
{
    public static async ValueTask SerializeAsync<T>(PipeWriter pipeWriter, IEnumerable<T?> values, int count, MessagePackOptions options)
    public static async ValueTask SerializeMapAsync<TKey, TValue>(PipeWriter pipeWriter, int count, IEnumerable<KeyValuePair<TKey, TValue>> value, ....)
    public static async IAsyncEnumerable<T> DeserializeAsync<T>(PipeReader pipeReader)
}

Since we think it is important to control the interval between flushes for performance, it is a good idea to prepare a separate Flush strategy option and call FlushAsync accordingly.

public sealed class FlushStrategy
{
    public static FlushStrategy FlushOnEveryWritten = new FlushStrategy { FlushPerItemCount = true, ItemCount = 1 };
    public static FlushStrategy FlushOnEvery64K = new FlushStrategy { FlushPerBytesWritten = true, BytesWritten = 65536 };

    public bool FlushPerItemCount { get; init; }
    public bool FlushPerBytesWritten { get; init; }
    public int ItemCount { get; init; }
    public int BytesWritten { get; init; }

    public bool ShouldFlush(int itemCount, int bytesWritten)
    {
        if (FlushPerItemCount && (ItemCount <= itemCount))
        {
            return true;
        }
        if (FlushPerBytesWritten && (BytesWritten <= bytesWritten))
        {
            return true;
        }
        return false;
    }
}

// MessagePackStreamingSerializer
public static async ValueTask SerializeAsync<T>(PipeWriter pipeWriter, IEnumerable<T?> values, int count, MessagePackOptions options, FlushStrategy flushStrategy, CancellationToken cancellationToken = default)
{
    static int WriteCore(PipeWriter pipeWriter, T? item, IMessagePackFormatter<T> formatter, MessagePackOptions options)
    {
        var writer = new MessagePackWriter(pipeWriter);
        formatter.Serialize(ref writer, ref item, options);
        return writer.WrittenCount;
    }

    IMessagePackFormatter<T> formatter = default!;
    int bytesWritten = 0;
    var itemCount = 0;
    foreach (var item in values)
    {
        bytesWritten += WriteCore(pipeWriter, item, formatter, options);
        itemCount++;
        if (flushStrategy.ShouldFlush(itemCount, bytesWritten))
        {
            await pipeWriter.FlushAsync(cancellationToken).ConfigureAwait(false);
        }
    }

    await pipeWriter.FlushAsync().ConfigureAwait(false);
}

As for Deserialize, it will be a 2-pass process, but how about preparing a buffer of 1 unit with TrySkip, and sending it to Reader?

I'd like to reuse the generation of Writer/Reader, but it's difficult due to the compatibility of async/await and ref struct. I don't have any good ideas.

5. Unified code-generation(dynamicassembly/source-generator/standalone code-generator/analyzer)

DynamicAssembly, DynamicMethod, Analyzer, Commandline Code Generator, MSBuild Task, and Source Generator. overlapping and scattered code is very frustrating and a breeding ground for bugs. Duplicate and scattered code is very frustrating and a breeding ground for bugs. I would like to introduce an intermediate abstract metatype to centralize the management. It will also allow users to dynamically create this MetaType and generate custom Formatters.

public abstract class MessagePackMetaType
{
    public MessagePackMetaMember[] Members { get; set; }
    public MessagePackMetaConstructor[] Constructors { get; set; }

    // Diagnostics of generated type....
    public override string ToString()
    {
        // Serialize:""
        // Deserialize:""
        // return @"new Foo(int , int, int,...) {} ";
    }
}

public abstract class MessagePackMetaConstructor
{
    public MessagePackMetaParameter[] Parameters { get; set; }
}

public abstract class MessagePackMetaMember
{
    public bool IsIntKey { get; set; }
    public int IntKey { get; set; }
    public string? StringKey { get; set; }
    public bool IsWritable { get; set; }
    public bool IsReadable { get; set; }
}

public abstract class MessagePackMetaParameter
{
    public string Name { get; set; }
    public bool HasDefaultValue { get; set; }
    public object DefaultValue { get; set; }
}

// ReflectionType
public sealed class ReflectionMetaType : MessagePackMetaType
{
    public Type Type { get; set; }
}

public sealed class ReflectionMetaMember : MessagePackMetaMember
{
    public MemberInfo MemberInfo { get; set; } // PropertyInfo or FieldInfo
}


public sealed class ReflectionMetaConstructor : MessagePackMetaConstructor
{
    public ConstructorInfo ConstructorInfo { get; set; }
}

public sealed class ReflectionMetaParameter : MessagePackMetaParameter
{
    public ParameterInfo ParameterInfo { get; set; }
}

// RoslynType
// ...

6. nullable

Unity support needs to continue; LTS does not yet support C# 8.0. We also need to wait for eventual C# 9.0 support for Generic Constraint. It does not work with the current Unity.

In my ZString, ZLogger, and MessagePipe, I handled it by copying from the post-build rewrite. NET side instead of Unity/Client side, so it is easier to handle the source code. NET side, so it's easier to handle the source code. Currently, I'm not able to make everything a source reference, such as copying the T4 stuff, so it should be good to split it up and make everything a copy.

7. .NET 6

using VS2022 preview. record, struct record, new types, etc...

8. CI to GitHub Actions

For OSS, CI with GitHub Actions would be better, and Unity is built as follows
https://github.com/Cysharp/ZString/blob/master/.github/workflows/build-debug.yml

I think it's a good idea to continue to put Canary builds in Azure DevOps. In my other repositories, I have achieved this by doing the following
https://github.com/Cysharp/MagicOnion/blob/master/.github/workflows/build-canary.yml

9. avoid SlowSpan for Unity

In Unity, Span uses Slow Span, and doesn't perform very well (I tested this issue with ZString). If you can use arrays, you can get back better performance by using arrays. Here are the results from a simple benchmark in Unity IL2CPP, arrays are 5 times faster.

image

for this, MessagePackWriter / MessagePackReader holds both array and span.

// MessagePackWriter
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void WriteInt32(int value)
{
    if (bufferArray == null)
    {
        if (!MessagePackPrimitives.TryWriteInt32(WritableSpan, value))
        {
            EnsureAndWriteInt32(value);
        }
    }
    else
    {
        if (!MessagePackPrimitives.TryWriteInt32(bufferArray, bytesWritten, bufferArrayCount, value))
        {
            EnsureAndWriteInt32(value);
        }
        else
        {
            bufferArrayCount -= 5;
        }
    }

    bytesWritten += 5;
    totalWritten += 5;
}

10. more performance improvement

This is a simple benchmark result from the current prototype implementation.

ArrayBufferWriter<byte> bufferWriter = default!;

[GlobalSetup]
public void Setup()
{
    bufferWriter = new ArrayBufferWriter<byte>(1024);
}

[Benchmark]
public byte[] MessagePackV2()
{
    var writer = new MessagePack.MessagePackWriter(bufferWriter);
    writer.WriteArrayHeader(10);
    for (int i = 0; i < 10; i++)
    {
        writer.WriteInt32(1000);
        writer.WriteInt32(2000);
        writer.WriteInt32(3000);
        writer.WriteInt32(4000);
    }
    writer.Flush();
    var xs = bufferWriter.WrittenSpan.ToArray();
    bufferWriter.Clear();
    return xs;
}

[Benchmark]
public byte[] MessagePackV3_Array()
{
    var writer = new MessagePackv3.MessagePackWriter(bufferWriter, true);
    writer.WriteArrayHeader(10);
    for (int i = 0; i < 10; i++)
    {
        writer.WriteInt32(1000);
        writer.WriteInt32(2000);
        writer.WriteInt32(3000);
        writer.WriteInt32(4000);
    }
    writer.Flush();
    var xs = bufferWriter.WrittenSpan.ToArray();
    bufferWriter.Clear();
    return xs;
}

[Benchmark]
public byte[] MessagePackV3_Span()
{
    var writer = new MessagePackv3.MessagePackWriter(bufferWriter, false);
    writer.WriteArrayHeader(10);
    for (int i = 0; i < 10; i++)
    {
        writer.WriteInt32(1000);
        writer.WriteInt32(2000);
        writer.WriteInt32(3000);
        writer.WriteInt32(4000);
    }
    writer.Flush();
    var xs = bufferWriter.WrittenSpan.ToArray();
    bufferWriter.Clear();
    return xs;
}

image

In .NET 5 is twice as fast.

This is probably due in large part to the elimination of the BufferWriter (there are Primitives introduced in 2, and Array/Span switching for 9).

I think it will be much faster in Reader(Deserialize) as well.

Other consider things

  • Generics of IBufferWriter<byte>

It is useful but MessagePackWriter<TBufferWriter>, will affects IMessagePackFormatter<T, TBufferWriter>, This has a significant impact on code generation, so I reject it.

  • following dotnet/runtime editor.config

My style deviates somewhat from that of dotnet/runtime. I'm starting to think that I need to start following it instead of my own preferences.

  • ContractlessStandardResolver.Options → StandardResolver.ContractlessOptions

for the IntelliSensebility.

  • Scope of breaking changes

The APIs of MessagePackSerializer/MessagePackWriter/MessagePackReader/MessagePackSerializerOptions will not be changed.
We believe that this will minimize the impact on general use.

@neuecc neuecc changed the title V3 proposal Proposal of MessagePack v3 Jun 21, 2021
@neuecc neuecc requested a review from AArnott June 21, 2021 10:12
@AArnott
Copy link
Collaborator

AArnott commented Jun 22, 2021

@neuecc This is quite the write-up, and I'm still processing it. But I'm curious what is in the (very large) payload of the PR?

We do have a 3.0 milestone for v3 breaking or large ideas. It seems to me each of your many ideas could be an individual issue in that milestone to better track discussions on each. Then as a whole we can decide when the time is right to start v3 development (which may be now, or last week, since you've evidently already done some coding).

Another high level concern I have is that with .NET Core (including .NET 5+), .NET lost the ability to load multiple assemblies with the same name but different assembly versions. At least, not without special AssemblyLoadContexts being created. This was a huge adoption blocker for our v2 release in ASP.NET Core. I'm not sure they can use it even now. A v3 library that includes binary compatibility breaks will similarly confuse and block adoption, and even more this time since .NET Core is that much more popular. I'm open to the idea, but we should be thoughtful about whether we can avoid breaking changes while still enhancing.

#57 is one issue that I thought would be best done if formatters could pass along a serialization state object to each other. That would be an interface change, that up to now I didn't think was worth it. But if we make breaking changes anyway, that's one issue to consider in the redesign.

@neuecc
Copy link
Member Author

neuecc commented Jun 22, 2021

The PR itself has been simplified by removing everything.
It's just meant to confirm the concept code.
https://github.com/neuecc/MessagePack-CSharp/tree/b72444dc28da2d92cb1b21542a792e1da7a6afbb/src/MessagePack

I wanted to show you the concept first, because if we are going to adopt these (some of them), we need to start the big work from there.
Since many of them are intertwined, we thought it would be easier to see the whole picture rather than individual idea.
With .NET 5(6), C# 9.0(10.0), there are even more options we can take than in v1 and v2, and I feel the need to modernize the code again.
And most importantly, there are many performance improvements.
(Also, for me, lower performance in Unity is a huge problem, so it's pretty high on my priority list!)

I agree with your concern about the compatibility of the assemblies.
Your excellent work has made v2 even more widely used than v1.
That's why I thought that require a big improvement to convince people to accept breaking changes.
And I finally came together to make this PR proposal.

I see. To be honest, I didn't consider #57 because I didn't feel the need to include it (including its performance implications).
Sure, we should think about it.

@pCYSl5EDgo
Copy link
Contributor

I love the v3 concept (the deserialization from ReadOnlySpan argument)!

The Circular Reference serialization (#57) needs a serialization state object (possibly Dictionary<TKey, object> field).

The serialization state object would be a reference type object.
To reduce managed memory allocation, it must be pooled and returned to the object pool.

ISerializationStateObject and its pool
public interface ISerializationStateObject : IDisposable
{
     void Register(object value, ref MessagePackWriter writer);
     bool TryGetKey(object value, out long key);
     void Register<T>(ref T value, ref MessagePackWriter writer) where T : struct;
     bool TryGetKey<T>(ref T value, out long key) where T : struct;
}

public interface ISerializationStateObjectPool
{
    ISerializationStateObject Rent();
    void Return(ISerializationStateObject obj);
}

ISerializationStateObjectPool should be contained in MessagePackOptions, just as ArrayPool is.

public ref struct MessagePackWriter
{
    public ISerializationStateObject? StateObject { get; set; }
}

public static class MessagePackSerializer
{
    public static void Serialize<T>(IBufferWriter<byte> bufferWriter, in T value)
    {
        var writer = new MessagePackWriter(writer);
        try
        {
            formatter.Serialize(ref writer, ref Unsafe.AsRef(value), options);
        }
        finally
        {
            if (writer.StateObject is not null)
            {
                options.StateObjectPool.Return(writer.StateObject);
            }
        }
    }
}

public sealed class ReferenceTypeCircularReferenceFormatter<T> : IMessagePackFormatter<T>
    where T : class
{
    public void Serialize(ref MessagePackWriter writer, ref T? value, MessagePackOptions options)
    {
        if (value is null) { /* Write Nil. */ return; }
        ISerializerStateObject stateObject = writer.StateObject ??= options.StateObjectPool.Rent();
        if (stateObject.TryGetKey(value, out long key))
        {
            // Serialize the key. Is the representation "Map [1] { key => Nil }" appropriate?
        }
        else
        {
            stateObject.Register(value, ref writer);
            // Serialize the value. Is the representation "Map [1] { Nil => value }" appropriate?
        }
    }
}

public sealed class ValueTypeCircularReferenceFormatter<T> : IMessagePackFormatter<T>
    where T : struct
{
    public void Serialize(ref MessagePackWriter writer, ref T value, MessagePackOptions options)
    {
        ISerializerStateObject stateObject = writer.StateObject ??= options.StateObjectPool.Rent();
        if (stateObject.TryGetKey(ref value, out long key))
        {
            // Serialize the key. Is the representation "Map [1] { key => Nil }" appropriate?
        }
        else
        {
            stateObject.Register(ref value, ref writer);
            // Serialize the value. Is the representation "Map [1] { Nil => value }" appropriate?
        }
    }
}

@ryancheung
Copy link

  1. MessagePackSerializer.Deserialize should be accept ReadOnlySpan<byte>

Crave this! I need to deserialize from a unmanaged buffer.

@AArnott
Copy link
Collaborator

AArnott commented Sep 2, 2021

@ryancheung this tip may work for you now: #81 (comment)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2021

This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@neuecc
Copy link
Member Author

neuecc commented Dec 2, 2021

keep for implementation memo.

@pCYSl5EDgo
Copy link
Contributor

It is better to split IMessagePackFormatter<T> into 4 interfaces(IMessagePackSerializer<T>, IMessagePackDeserializer<T>, IMessagePackAsyncSerializer<T>, and IMessagePackAsyncDeserializer<T>).
This is because there is no guarantee that serialized binary is always deseialized by counterpart.

public interface IMessagePackSerializer<T>
{
    /*static abstract*/
    void Serialize<TBuffer>(ref MessagePackWriter<TBuffer> writer, ref T value) where TBuffer : IBufferWriter<byte>;
}

public ref struct MessagePackWriter<TBuffer> where TBuffer : IBufferWriter<byte>
{
    public TBuffer Buffer;
    public MessagePackSerializerOptions Options;
    public MessagePackSerializerStateObject State;
    public void Dispose();
}

public interface IMessagePackDeserializer<T>
{
    /*static abstract*/
    bool TryDeserialize(ref MessagePackReader reader, out T? value);
}

public ref struct MessagePackReader
{
    public ReadOnlySpan<byte> Span;
    public MessagePackSerializerOptions Options;
    public MessagePackDeserializerStateObject State;
    public void Dispose();
}

public interface IMessagePackAsyncSerializer<T>
{
    /*static abstract*/
    ValueTask SerializeAsync(MessagePackAsyncWriter writer, T value, CancellationToken token);
}

public interface IMessagePackAsyncDeserializer<T>
{
    /*static abstract*/
    ValueTask<T?> DeserializeAsync<TEnumerable>(MessagePackAsyncReader<TEnumerable> reader, CancellationToken token) where TEnumerable : IAsyncEnumerable<byte>;
}

@Phoenix359
Copy link

  1. MessagePackSerializer.Deserialize should be accept ReadOnlySpan<byte>

Crave this! I need to deserialize from a unmanaged buffer.

Will this also help enable deserialization with minimal string memory allocation?
i.e. Could this be used inside messagepack to construct a dictionary during deserialization where the byte representation is the key and the string result is the value so deserialization can be skipped for duplicate strings and we can instead reuse the already allocated memory?

@AArnott
Copy link
Collaborator

AArnott commented Dec 25, 2021

Will this also help enable deserialization with minimal string memory allocation?
i.e. Could this be used inside messagepack to construct a dictionary during deserialization where the byte representation is the key and the string result is the value so deserialization can be skipped for duplicate strings and we can instead reuse the already allocated memory?

Not at all related, no. But #1285 already delivered what you're asking for, @Phoenix359.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@neuecc
Copy link
Member Author

neuecc commented Mar 28, 2022

No.3 - ref T can not pass property directly, there is little merit in considering that the code is more complex, yet the target is mostly property.
No.6 - It's time for Unity's LTS lower limit to be C# 8.0, so we can fully nullable support
No.9 - Unity supports FastSpan on IL2CPP so this feature is not required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants