Skip to content

Improve Model serialization/deserialization#620

Merged
n1t0 merged 1 commit intomasterfrom
fix-model-serde
Feb 4, 2021
Merged

Improve Model serialization/deserialization#620
n1t0 merged 1 commit intomasterfrom
fix-model-serde

Conversation

@n1t0
Copy link
Copy Markdown
Contributor

@n1t0 n1t0 commented Feb 4, 2021

Fix #600

As we manually implement Serialize and Deserialize for the various models, we didn't include the #[serde(tag = "type")] we use everywhere else, so when deserializing we can only know what Model it is based on the various fields we see.
This used to work fine as long as these models were different enough, but it is not the case anymore with WordPiece and WordLevel that can be deserialized from the same serialized json.

This PR fixes this by adding the type in the serialization process, and using it if it is defined. This is also backward compatible because we don't make it mandatory, but we add a layer of verification based on the presence of the fields (mainly for WordPiece and WordLevel).

@n1t0 n1t0 merged commit a8f7564 into master Feb 4, 2021
@n1t0 n1t0 deleted the fix-model-serde branch February 4, 2021 14:59
@n1t0 n1t0 mentioned this pull request Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Saved wordlevel tokenizer loads as a wordpiece tokenizer.

1 participant