[Continuous Batching] Snapshot generation outputs without mutating request state by Incheonkirin · Pull Request #46670 · huggingface/transformers

Incheonkirin · 2026-06-15T13:16:49Z

What does this PR do?

This fixes RequestState.to_generation_output() in continuous batching so that
building a GenerationOutput does not mutate the active request state or return
live aliases for growing output buffers.

Streaming continuous batching can call to_generation_output() after every
generated token. On current main, that method returns the same list objects
held by the active RequestState for generated_tokens, logprobs, and
timestamps. It also mutates self.generated_tokens and self.initial_tokens
in the _true_initial_tokens soft-reset path.

That creates two observable issues:

a previously returned streaming output can change after later tokens are
generated
after soft reset, repeated output conversion can fold already-generated
prompt-suffix tokens back into state.generated_tokens, so the request can
terminate with fewer generated tokens than requested

This PR makes output conversion a snapshot operation. It keeps prompt_ids by
reference on the normal path to avoid copying the prompt for every streamed
token, copies only the growing buffers, and computes the soft-reset
prompt/generated split locally without changing RequestState.

I added two CPU-only regression tests:

test_generation_outputs_are_snapshots
test_streaming_output_after_soft_reset_does_not_shorten_generation

These fail on the previous implementation and pass with this change.

As an additional check, I ran a CUDA end-to-end run with real continuous
batching streaming requests, use_cuda_graph=True, forced soft reset
(offload_calls=7, soft_reset_calls=7), do_sample=False, and
max_new_tokens=30. On the unpatched source, several requests stopped at 21-23
generated tokens and some previously returned first chunks later had length
20/22/8. With this patch, all 12 requests reached 30 generated tokens and all
first chunks remained length 1. I can provide a small repro harness if useful.

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Rocketknight1 · 2026-06-15T14:20:43Z

cc @remi-or @McPatate for CB

remi-or

Good catch! Thanks for the contribution, I just have some style-related nits :)

remi-or · 2026-06-15T14:35:48Z


    def to_generation_output(self):
        """Convert the request state to a GenerationOutput object."""
+        prompt_ids = self.initial_tokens


Could we move this definition to an else block to improve readability? Same with generated_tokens I think

remi-or · 2026-06-15T14:36:31Z

        """Convert the request state to a GenerationOutput object."""
+        prompt_ids = self.initial_tokens
+        generated_tokens = self.generated_tokens[:]
+        logprobs = self.logprobs[:]


No need to define intermediate variables here, no? We can just inline them when creating GenerationOutput

Incheonkirin · 2026-06-15T23:15:58Z

Thanks, addressed both nits. I kept the list copies for the mutable output fields so the snapshot behavior stays intact.

github-actions · 2026-06-15T23:27:26Z

CI Dashboard: View test results in Grafana

remi-or

LGTM! Thanks for the contribution.

HuggingFaceDocBuilderDev · 2026-06-16T00:54:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Incheonkirin · 2026-06-16T10:50:21Z

Thanks @remi-or! Those nits were good calls, and the code reads cleaner now. Appreciate the careful review and merge.

remi-or requested changes Jun 15, 2026

View reviewed changes

Snapshot continuous batching generation outputs

268744f

Incheonkirin force-pushed the fix-cb-streaming-output-snapshot branch from 2d74078 to 268744f Compare June 15, 2026 23:15

remi-or reviewed Jun 16, 2026

View reviewed changes

remi-or approved these changes Jun 16, 2026

View reviewed changes

remi-or added this pull request to the merge queue Jun 16, 2026

Merged via the queue into huggingface:main with commit 7ff490a Jun 16, 2026
121 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Continuous Batching] Snapshot generation outputs without mutating request state#46670

[Continuous Batching] Snapshot generation outputs without mutating request state#46670
remi-or merged 1 commit into
huggingface:mainfrom
Incheonkirin:fix-cb-streaming-output-snapshot

Incheonkirin commented Jun 15, 2026

Uh oh!

Rocketknight1 commented Jun 15, 2026

Uh oh!

remi-or left a comment •

edited

Loading

Uh oh!

remi-or Jun 15, 2026

Uh oh!

remi-or Jun 15, 2026

Uh oh!

Incheonkirin commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

remi-or left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2026

Uh oh!

Uh oh!

Incheonkirin commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Incheonkirin commented Jun 15, 2026

What does this PR do?

Code Agent Policy

Before submitting

Uh oh!

Rocketknight1 commented Jun 15, 2026

Uh oh!

remi-or left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

remi-or Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

remi-or Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Incheonkirin commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

remi-or left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2026

Uh oh!

Uh oh!

Incheonkirin commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

remi-or left a comment •

edited

Loading