View source on GitHub
|
Sentencepiece tokenizer with tf.text interface.
text.FastSentencepieceTokenizer(
model, reverse=False, add_bos=False, add_eos=False
)
Methods
detokenize
detokenize(
input
)
Detokenizes tokens into preprocessed text.
| Args | |
|---|---|
input
|
A RaggedTensor or Tensor with int32 encoded text with rank >=
1.
|
| Returns | |
|---|---|
| A N-1 dimensional string Tensor or RaggedTensor of the detokenized text. |
tokenize
tokenize(
inputs
)
The main tokenization function.
vocab_size
vocab_size()
Returns size of the vocabulary in Sentencepiece model.
View source on GitHub