Skip to content

Exception: Custom PreTokenizer cannot be serialized #613

@carter54

Description

@carter54

Hello~ I'm trying to train a BPE tokenizer with a customized pre_tokenizer.
The customized pre_tokenizer used a 3rd party package likes what has been shown in

after training the tokenizer, I tried to use

tokenizer.save(tokenizer_path)

to save the tokenizer, but an Exception appeared:

Exception: Custom PreTokenizer cannot be serialized

I can see that a customized pre_tokenizer cannot be saved with the main tokenizer model, so I should save the main model individually. When loading the tokenizer, I should manually add the pre_tokenizer. Am I right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions