Eval self.left_padding whenever it is updated in BatchRotatingKVCache by rltakashige · Pull Request #960 · ml-explore/mlx-lm

rltakashige · 2026-03-07T14:08:46Z

Motivation:

I was running into RuntimeError: [metal::malloc] Resource limit (499000) exceeded. when using batching for GPT OSS. (see the attached log.txt). Upon investigation, this happened for any model with rotating KV cache.

Steps to Reproduce:
Run my attached reproduce_batch_kvcache_leak.py with any model that uses sliding window attention with python reproduce_batch_kvcache_leak.py --model <model path> --crash. This runs the model in a batch generator with two requests for 50000 steps together. I have been using GPT OSS 120B MXFP4 Q8 primarily.

--add-eval adds an eval to the left padding, which prevents this from occurring.

Issue and Proposed changes
I think the issue is caused by the left padding never being evaluated, meaning buffers are accumulated in an unbounded fashion.
I am not sure whether you'd prefer moving the evals outside this function. However, it is only necessary to evaluate the left padding when it is updated (from testing).

Eval self.left_padding whenever it is updated

5e0c484

rltakashige added a commit to exo-explore/exo that referenced this pull request Mar 7, 2026

Update mlx lm to ml-explore/mlx-lm#960

f0b6628

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval self.left_padding whenever it is updated in BatchRotatingKVCache#960

Eval self.left_padding whenever it is updated in BatchRotatingKVCache#960
rltakashige wants to merge 1 commit intoml-explore:mainfrom
rltakashige:leo/eval-left-padding-in-batched-rotation

rltakashige commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rltakashige commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rltakashige commented Mar 7, 2026 •

edited

Loading