Skip to content

Add support for tagged values.#408

Closed
pyfisch wants to merge 1 commit into
serde-rs:masterfrom
pyfisch:tagged_values
Closed

Add support for tagged values.#408
pyfisch wants to merge 1 commit into
serde-rs:masterfrom
pyfisch:tagged_values

Conversation

@pyfisch

@pyfisch pyfisch commented Jun 23, 2016

Copy link
Copy Markdown
Contributor

Some serialization formats allow the addition of "tags"
to values to add additional information. These formats
include CBOR, BSON and YAML.

This commit introduces serialization and deserialization
of tags. It introduces a trait Tagger that resolves
the tag for a specific format. It supports integer,
binary, and string tags and is extensible to cover
more yet unknown formats.

By default tags are discarded at deserialization
except when the user has implemented visit_tagged_value
function.

To serialize tags the user has to call
serialize_tagged_value.

Tagged values may be of any type.

Thanks to @dtolnay, @oli-obk and @erickt for their comments on tags.

Supersedes #301

Comment thread serde/src/ser/mod.rs

/// A trait that describes a type that can serialize a stream of values into the underlying format.
pub trait Serializer {
pub trait Serializer: Sized {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that necessary?

@pyfisch pyfisch Jun 23, 2016

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler says so (and is usually right 😬).

   Compiling serde v0.7.7 (file:///serde/serde)
/serde/serde/src/ser/mod.rs:359:15: 359:24 error: the trait bound `Self: std::marker::Sized` is not satisfied [E0277]
/serde/serde/src/ser/mod.rs:359         value.serialize(self)
                                                                         ^~~~~~~~~
/serde/serde/src/ser/mod.rs:359:15: 359:24 help: run `rustc --explain E0277` to see a detailed explanation
/serde/serde/src/ser/mod.rs:359:15: 359:24 help: consider adding a `where Self: std::marker::Sized` bound
error: aborting due to previous error
error: Could not compile `serde`.

Maybe it can be done in a different way but I don't know how.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... if you look at all the other default method impls, none call methods on the value. I'm aware of the motivation, that formats not caring about tags don't need to implement a new method. But maybe we need something along the lines of MapVisitor and SeqVisitor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes adding a TaggedVisitor is possible. In this case the Sized bound can be removed.

@pyfisch

pyfisch commented Jun 23, 2016

Copy link
Copy Markdown
Contributor Author

I have added support for tagged values in a branch of my CBOR crate

Comment thread serde/src/ser/mod.rs Outdated
#[inline]
fn serialize_tagged_value<T, V>(&mut self,
_tag: T,
value: V) -> Result<(), Self::Error>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you update the indent here?

@erickt

erickt commented Jun 30, 2016

Copy link
Copy Markdown
Member

How do you handle tagging higher level types, like a datetime value? Would the idea be that the serializer calls Tagger::string_tag, which returns Some("DateTime"), and it's the responsibility of the serializer to interpret those tags, or error out if it doesn't know how to handle it?

Another thing to consider is that specialization is coming down the pipeline. How could that impact this design? It would be interesting if there was a design that eventually could enable serde to have a registry of generic tag names that then could be used by the different serialization formats without having to do a runtime check.

Some serialization formats allow the addition of "tags"
to values to add additional information. These formats
include CBOR, BSON and YAML.

This commit introduces serialization and deserialization
of tags. It introduces a trait Tagger that resolves
the tag for a specific format. It supports integer,
binary, and string tags and is extensible to cover
more yet unknown formats.

By default tags are discarded at deserialization
except when the user has implemented visit_tagged_value
function.

To serialize tags the user has to call
serialize_tagged_value.

Tagged values may be of any type.

Thanks to @dtolnay, @oli-obk and @erickt for their comments on tags.

Supersedes #301
@pyfisch

pyfisch commented Jun 30, 2016

Copy link
Copy Markdown
Contributor Author

I'll take the rust-url serialization as an example. Currently the URL is serialized as a string:

impl serde::Serialize for Url {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error> where S: serde::Serializer {
        format!("{}", self).serialize(serializer)
    }
}

Since CBOR recommends an optional tagging of URIs with tag 32 this information could be added to the implementation of Serialize. As you propose to use a registry for common types there could be a "serde-registry" format mapping types to string tags. It would be up to the formats serializer to map the generic tag to a format specific tag or to ignore it. The introduction of a serde registry is not strictly necessary, but it adds a bit of portability. Now a format the rust-url authors are unaware of can still tag URL values correctly if it knows the mapping from "serde-url" to an internal tag.

impl serde::Serialize for Url {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error> where S: serde::Serializer {
        struct Tagger;
        impl serde::Tagger for Tagger {
            fn u64_tag(&self, format: &'static str) -> Option<u64> {
                if format == "cbor" {
                    // URI; see RFC7049 Section 2.4.4.3
                    return Some(32);
                }
                None
            }

            fn string_tag(&self, format: &'static str) -> Option<&str> {
                if format == "serde-registry" {
                    return Some("serde-url");
                }
                None
            }
        }
        serializer.serialize_tagged_value(Tagger, self.as_str())
    }
}

@pyfisch

pyfisch commented Jul 16, 2016

Copy link
Copy Markdown
Contributor Author

@dtolnay Why is this "WIP"? What needs to be changed?

@oli-obk

oli-obk commented Jul 20, 2016

Copy link
Copy Markdown
Member

So I've been thinking some about this. And I still dislike the runtime decision about the tag type.

For serialization I played with impl specialization, and (I think) it works

Output for serializing with DefaultSerializer:

Tagged::serialize
S::serialize_tag (default)
DefaultSerializer::i32: 99

Output for serializing with MySerializer (that wants tags):

Tagged::serialize
MySerializer::serialize_tag
Tag: MySerializer::i32: 42
MySerializer::i32: 99

The impl specialization magic happens here:

struct Tagged(i32);

trait SerializeTag<S: Serialize>: Serializer {
    fn serialize_tag(&mut self, value: &S) -> Result<(), ()>;
}

impl SerializeTag<Tagged> for MySerializer {
    fn serialize_tag(&mut self, value: &Tagged) -> Result<(), ()> {
        println!("MySerializer::serialize_tag");
        self.serialize_tagged(42, value.0)
    }
}

impl<S: Serializer> SerializeTag<Tagged> for S {
    default fn serialize_tag(&mut self, value: &Tagged) -> Result<(), ()> {
        println!("S::serialize_tag (default)");
        value.0.serialize(self)
    }
}

impl Serialize for Tagged {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), ()>
        where S: Serializer {
        println!("Tagged::serialize");
        serializer.serialize_tag(self)
    }
}

I haven't put any thought into deserialization yet, but I feel confident we can rig something.

Note: this design does not touch the Serialize trait at all, and adds zero implementation work to existing implementations (I considered both alternatives, but found this "workaround" to be suitable)

@oli-obk

oli-obk commented Jul 20, 2016

Copy link
Copy Markdown
Member

Ok, there's an issue with this. It works fine if the SerializeTag trait exists in the same crate as the type to be serialized and the Serializer that requires tags. I can't get it to work yet, if the Serializer comes from a different crate, and the local crate defines the SerializeTag trait and impls it. Might be a specialization-bug though

@oli-obk

oli-obk commented Jul 20, 2016

Copy link
Copy Markdown
Member

Ah, nevermind, it's just horrible error reporting. If the Serializer is generic, make sure to specify ALL bounds that are on the impl Serializer.

@oli-obk

oli-obk commented Jul 20, 2016

Copy link
Copy Markdown
Member

I implemented the serialization part in serde on a branch based on yours (+ I merged master into it) https://github.com/oli-obk/rust-serde/commits/tagged_values

@oli-obk

oli-obk commented Jul 20, 2016

Copy link
Copy Markdown
Member

I got deserialization to mostly work. it's now failing on something really weird. I think syntex is eating my code.

@dtolnay

dtolnay commented Jul 22, 2016

Copy link
Copy Markdown
Member

Closing in favor of #455 which seems more flexible and more promising. We can reopen later if it turns out otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants