Checklist
Describe the bug
When launching the model server with extra_buffer enabled and running tests for structured output, a slow but steady increase in VRAM consumption occurs, eventually causing OOM errors and model restarts in our production environment (every few hours). No leak is observed when extra_buffer is disabled, nor when it is enabled but JSON responses are not requested.
Reproduction
mamba_scheduler_strategy=extra_buffer
and structured JSON response generation
Environment
main
Checklist
Describe the bug
When launching the model server with
extra_bufferenabled and running tests for structured output, a slow but steady increase in VRAM consumption occurs, eventually causing OOM errors and model restarts in our production environment (every few hours). No leak is observed whenextra_bufferis disabled, nor when it is enabled but JSON responses are not requested.Reproduction
mamba_scheduler_strategy=extra_buffer
and structured JSON response generation
Environment
main