More cache options. by Narsil · Pull Request #1675 · huggingface/tokenizers

Narsil · 2024-11-06T07:01:24Z

HuggingFaceDocBuilderDev · 2024-11-06T07:03:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks, closed #1662!

wangguanggg · 2025-07-01T05:35:15Z

@Narsil @ArthurZucker excuse me, i am facing a memory leaks problem may caused by this. And could you help me to disable the cache in this version?
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer.backend_tokenizer.model.cache_capacity(0)
This code cann't work.

ArthurZucker · 2025-07-01T09:27:30Z

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model.cache_capacity(0)

where does backend_tokenizer come from?

wangguanggg · 2025-07-01T09:34:58Z

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model.cache_capacity(0)

where does backend_tokenizer come from?

AttributeError: 'tokenizers.models.BPE' object has no attribute 'cache_capacity'

ArthurZucker · 2025-07-01T09:43:58Z

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model._clear_cache()

or

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model._resize_cache(0)

ArthurZucker · 2025-07-01T09:44:20Z

You can find the functions using dir(tokenizer._tokenizer.model)

ArthurZucker · 2025-07-01T09:46:42Z

do you want to open a PR to add some doc?

wangguanggg · 2025-07-01T09:52:08Z

do you want to open a PR to add some doc?

no, i just won't to disable cache. but i only can do it by backend_tokenizer.

ArthurZucker · 2025-07-01T10:03:18Z

tokenizer._tokenizer.model._resize_cache(0)

will disable it

ArthurZucker · 2025-07-01T10:03:29Z

backend_tokenizer does not exist sir

wangguanggg · 2025-07-01T10:08:03Z

backend_tokenizer does not exist sir

the tokenizers generated by transformers seems have.

ArthurZucker · 2025-07-01T11:50:24Z

No, the backend_tokenizer is _tokenizer

More cache options.

61780f6

Fixing error messages.

6a2ad5c

ArthurZucker approved these changes Nov 6, 2024

View reviewed changes

Narsil mentioned this pull request Nov 6, 2024

Memory leak for large strings #1539

Closed

Narsil merged commit c6b5c3e into main Nov 6, 2024

Narsil deleted the more_cache_options branch November 6, 2024 10:12

xenova mentioned this pull request Apr 13, 2025

Implement LRU cache for BPE tokenizer huggingface/transformers.js#1283

Merged

ArthurZucker mentioned this pull request May 27, 2025

Memory leak is observed when using the AutoTokenizer and AutoModel with Python 3.10.* #1706

Closed

4 tasks

Conversation

Narsil commented Nov 6, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

wangguanggg commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

wangguanggg commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

wangguanggg commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

wangguanggg commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangguanggg commented Jul 1, 2025 •

edited

Loading