Skip to content

More cache options.#1675

Merged
Narsil merged 2 commits intomainfrom
more_cache_options
Nov 6, 2024
Merged

More cache options.#1675
Narsil merged 2 commits intomainfrom
more_cache_options

Conversation

@Narsil
Copy link
Copy Markdown
Contributor

@Narsil Narsil commented Nov 6, 2024

Fixes #1539

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, closed #1662!

@wangguanggg
Copy link
Copy Markdown

wangguanggg commented Jul 1, 2025

@Narsil @ArthurZucker excuse me, i am facing a memory leaks problem may caused by this. And could you help me to disable the cache in this version?
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer.backend_tokenizer.model.cache_capacity(0)
This code cann't work.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model.cache_capacity(0)

where does backend_tokenizer come from?

@wangguanggg
Copy link
Copy Markdown

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model.cache_capacity(0)

where does backend_tokenizer come from?

AttributeError: 'tokenizers.models.BPE' object has no attribute 'cache_capacity'

@ArthurZucker
Copy link
Copy Markdown
Collaborator

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model._clear_cache()

or

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(_path)
tokenizer._tokenizer.model._resize_cache(0)

@ArthurZucker
Copy link
Copy Markdown
Collaborator

You can find the functions using dir(tokenizer._tokenizer.model)

@ArthurZucker
Copy link
Copy Markdown
Collaborator

do you want to open a PR to add some doc?

@wangguanggg
Copy link
Copy Markdown

do you want to open a PR to add some doc?

no, i just won't to disable cache. but i only can do it by backend_tokenizer.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

tokenizer._tokenizer.model._resize_cache(0)

will disable it

@ArthurZucker
Copy link
Copy Markdown
Collaborator

backend_tokenizer does not exist sir

@wangguanggg
Copy link
Copy Markdown

backend_tokenizer does not exist sir

the tokenizers generated by transformers seems have.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

No, the backend_tokenizer is _tokenizer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak for large strings

4 participants