Skip to content

Python - Fix Normalizer.normalize with PyNormalizedStringRefMut#618

Merged
n1t0 merged 1 commit intomasterfrom
fix-normalizer-python
Feb 3, 2021
Merged

Python - Fix Normalizer.normalize with PyNormalizedStringRefMut#618
n1t0 merged 1 commit intomasterfrom
fix-normalizer-python

Conversation

@n1t0
Copy link
Copy Markdown
Contributor

@n1t0 n1t0 commented Feb 3, 2021

This PR adds support for the case where we want to call Normalizer.normalize with a NormalizedString we got from a PreTokenizedString:

from tokenizers import normalizers, pre_tokenizers, PreTokenizedString

normalizer = normalizers.Lowercase()
pre_tokenizer = pre_tokenizers.Whitespace()

# Manually normalize and pre-tokenize a string
s = PreTokenizedString(text)
s.normalize(lambda n: tokenizer.normalizer.normalizer(n))
pre_tokenizer.pre_tokenize(s)

@n1t0 n1t0 force-pushed the fix-normalizer-python branch from 2efb361 to dc12d61 Compare February 3, 2021 17:46
@n1t0 n1t0 changed the base branch from fix-spm-conversion to master February 3, 2021 17:46
@n1t0 n1t0 force-pushed the fix-normalizer-python branch from dc12d61 to a457da5 Compare February 3, 2021 20:48
@n1t0 n1t0 merged commit db22cb6 into master Feb 3, 2021
@n1t0 n1t0 deleted the fix-normalizer-python branch February 3, 2021 20:48
@n1t0 n1t0 mentioned this pull request Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant