-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Description
Hello~ I'm trying to train a BPE tokenizer with a customized pre_tokenizer.
The customized pre_tokenizer used a 3rd party package likes what has been shown in
| class JiebaPreTokenizer: |
after training the tokenizer, I tried to use
tokenizer.save(tokenizer_path)
to save the tokenizer, but an Exception appeared:
Exception: Custom PreTokenizer cannot be serialized
I can see that a customized pre_tokenizer cannot be saved with the main tokenizer model, so I should save the main model individually. When loading the tokenizer, I should manually add the pre_tokenizer. Am I right?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels