Incremental Detokenization

Hello, thank you for building such a great foundational library.

I work on the `vllm-project`, and we have some nasty, slow code related to the challenges of incremental detokenization for streaming use cases. This is needed to defeat cleanup algorithms in the decode where the tokenizer decides to add a space or not depending on the surrounding ids. Relevant code:
- https://github.com/vllm-project/vllm/blob/6650e6a930dbdf1cd4def9b58e952376400ccfcf/vllm/transformers_utils/detokenizer_utils.py#L78. 

We are trying to optimize this code as it can be expensive for high batch size serving. Before we do this, I was wondering if `tokenizers` has any plans to handle incremental detokenization internally?  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Detokenization #1666

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incremental Detokenization #1666

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions