Skip to content

WIP: Parsing and emitting tagged values#301

Closed
pyfisch wants to merge 1 commit into
serde-rs:masterfrom
pyfisch:taggedvalues
Closed

WIP: Parsing and emitting tagged values#301
pyfisch wants to merge 1 commit into
serde-rs:masterfrom
pyfisch:taggedvalues

Conversation

@pyfisch

@pyfisch pyfisch commented May 1, 2016

Copy link
Copy Markdown
Contributor

Some serialization formats like CBOR or BSON allow the definition of "tagged values" or "subtypes" to extend the format or to add additional type information to values. They can only supported with some help from serde.

This patch should add support to deserialize and serialize these tagged values. The default behavior will be to discard the tag and only use the value so they are completely optional to use.

For now the patch only supports serialization but I would like to get early feedback about the inclusion in serde.

See also #163 and pyfisch/cbor#3

Comment thread serde/src/ser/mod.rs Outdated
///
/// The tags are provided as a pair of format (like `cbor`) and a numeric
/// tag. It is possible to supply different tags for different formats.
/// The serialization library will select an approbiate tag if available

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Appropriate

@oli-obk

oli-obk commented May 1, 2016

Copy link
Copy Markdown
Member

This means that types need to know about the formats they serialize to... This would break the current separation of serializers and serializees. I'm assuming you want the same type to yield different tag values depending on the serializer? Otherwise this could maybe be merged with some kind of general enum discriminant mapping. Then your tagged types would simply be an enum whose discriminant is converted to a specific type.

Another possibility might be to allow specialization to create different impls depending on the serializer, but I have not put any thought into whether that is feasible.

@pyfisch

pyfisch commented May 1, 2016

Copy link
Copy Markdown
Contributor Author

The tag values are defined by the format. Some formats do not support tags at all and formats that support them will have different values for the same thing. For this reason I need to yield different tag values depending on the serializer.

In CBOR every application can define custom tags so any solution using an enum with a fixed list of types is unusable. Currently there are tags for URLs (tag 32) and UUIDs (tag 37) for example. Such types are useful for multiple formats. But one day a user might define a tag 6877 he uses in an application to mark color names in french language. The user must be able to serialize this specific tag. There is a list of IANA registered tags.

I have not yet worked with specialization and don't know if how it could be used in this case.

@dtolnay

dtolnay commented May 1, 2016

Copy link
Copy Markdown
Member

Would it be possible to generalize this to formats in which tags are not u64? YAML tags are strings.

@pyfisch

pyfisch commented May 4, 2016

Copy link
Copy Markdown
Contributor Author

Probably I should generalize it. But I don't know how to do that. Because if I support two different formats (u64 and String) I should also support i8 (used by BSON but can be mapped to u64) and probably other formats.

@oli-obk

oli-obk commented May 4, 2016

Copy link
Copy Markdown
Member

Well... If we can all agree that aggregate types make no sense here, we have a rather short list:

enum Tag<'a> {
    U64(u64), U32(u32), U16(u16), U8(u8),
    I64(i64), I32(i32), I16(i16), I8(i8),
    String(&'a str),
    Bytes(&'a [u8]),
    F64(f64), F32(f32),
}

Then the second tuple field of the tag arguments can simply be that enum, and everyone can decide what they want to do with it.

@erickt

erickt commented May 4, 2016

Copy link
Copy Markdown
Member

I've thought about trying to implement something like this for cbor, because I had to do that annoying hack to get tags to work. I'm not crazy about that mapping from serializer to tag though, especially since it would result in linear scan of tag types. Have you considered other options?

I suppose one option could be a central registry of tag names that both the serializers and serializees use, but then that'd make it hard for people to create a custom type. The other option could be to just use a string, but that might make things non-portable.

However, tags are to a certain extent a non-portable construct. If I have, say, a DateTime object that I want to serialize to cbor and yaml (assuming they have custom), I would have to modify the impl of DateTime to support each of these backends to add in the right tag names for each serializer, so we might not actually need something super generic. Specialization might be the right approach here once it's stabilized.

@pyfisch

pyfisch commented May 4, 2016

Copy link
Copy Markdown
Contributor Author

@oli-obk Also floats make no real sense since they are inexact and for this reason no good keys.

@erickt as I said before people will want to use custom types so this should be easy. An alternative to the linear scan could be a callback. A closure is passed to serialize_tagged_value that takes a single argument describing the used serialization format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants