-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Memory leak for large strings #1539
Copy link
Copy link
Description
This snippet will cause memory usage to rise indefinitely:
from transformers import AutoTokenizer
import gc
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", use_fast=True)
refresh_every = 100000
for i in range(100000):
s = f'{i} {i} ' * 10000
tokenizer.encode(s)
gc.collect()
if i % 100 == 0:
print(i)
if i % refresh_every == 0:
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", use_fast=True)If you set refresh_every to 100000 (like it is in the snippet), the memory usage will keep on rising. This colab notebook crashes after about 15 minutes of executing.
If you set refresh_every to 100, the memory consumption will be stable.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels