[Scheduler] Defer prefill input_ids H2D to forward stream, unify resolve via future_map by hnyls2002 · Pull Request #25945 · sgl-project/sglang

hnyls2002 · 2026-05-21T05:51:26Z

Summary

Defer prefill input_ids H2D from schedule_stream to forward_stream, and unify input_ids materialization across overlap and non-overlap: input_ids is left None during scheduling and resolved at forward entry through the always-on FutureMap relay.

Mechanism

prepare_for_extend stages prefill prompt tokens as pinned CPU (prefill_input_ids_cpu) instead of building the GPU tensor.
decode / mixed-chunk relay the last sampled token through FutureMap.output_tokens_buf (stash on forward, gather next iter).
resolve_forward_inputs materializes input_ids at forward entry on the forward stream: H2D for prefill, gather for decode, cat for mixed.
forward_stream_ctx wraps the overlap isolation, with resolve_forward_inputs sitting between the stream barrier and isolation.

Coverage (all forward modes routed through the relay)

PP rank-0 mixed-chunk: stash pp_outputs next_token_ids for next-iter gather
embedding / reward: resolve_forward_inputs before forward
encoder-decoder extend: rebuild prefill_input_ids_cpu after encoder stripping
hisparse rejoin and disagg PREBUILT (non-spec)
merge_batch: tolerate None input_ids on either side (fall back to relay gather)

Spec / non-overlap

spec_v1 (non-overlap spec): worker shape doesn't match req_pool_indices, so keep the direct batch.input_ids assignment and skip the relay stash (dispatch by payload type).

Fixes

penalty path crashed reading input_ids=None — cumulate from Req.output_ids instead
spec_v2 isolation snapshot re-installed already-consumed staging
spec: read batch.device instead of batch.input_ids.device (may be None pre-worker)

Cleanup

init forward_stream_ctx unconditionally (PP non-overlap also uses it)
split flatten_arrays_to_pinned_cpu helper; _gpu naming; read CI flag via envs

CI States

Latest PR Test (Base): 🚫 Run #26680708733
Latest PR Test (Extra): ✅ Run #26680708677

gemini-code-assist · 2026-05-21T05:51:30Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…tras gather

…fwd-stream # Conflicts: # python/sglang/srt/managers/overlap_utils.py # python/sglang/srt/managers/scheduler.py

…ces at stash time

…h (shape mismatch)

…lls stale staging (BUG-3)

…ts sits between

…may be None pre-worker)

…fwd-stream # Conflicts: # python/sglang/srt/managers/overlap_utils.py # python/sglang/srt/managers/schedule_batch.py

…fwd-stream # Conflicts: # python/sglang/srt/managers/schedule_batch.py # python/sglang/srt/managers/scheduler.py

hnyls2002 · 2026-05-30T09:52:29Z

Base CI: https://github.com/sgl-project/sglang/actions/runs/26672334679/job/78629647197
Extra CI: https://github.com/sgl-project/sglang/actions/runs/26672334643/job/78628293873

…H2D copy - Add --extra-index-url https://pypi.org/simple/ to xpu.Dockerfile pip install - Remove torchao from XPU Docker image (not needed/supported on XPU) - Materialize deferred H2D input_ids copy in bench_one_batch extend() so bench bypassing the scheduler still works after sgl-project#25945

…lve via future_map (sgl-project#25945)

…H2D copy - Add --extra-index-url https://pypi.org/simple/ to xpu.Dockerfile pip install - Remove torchao from XPU Docker image (not needed/supported on XPU) - Materialize deferred H2D input_ids copy in bench_one_batch extend() so bench bypassing the scheduler still works after sgl-project#25945

…lve via future_map (sgl-project#25945)

…lve via future_map (#25945)

…lve via future_map (sgl-project#25945)

hnyls2002 requested review from Ying1123, merrymercy and xiezhq-hermann as code owners May 21, 2026 05:51

hnyls2002 added run-ci run-ci-extra bypass-fastfail labels May 21, 2026

hnyls2002 closed this May 22, 2026

hnyls2002 force-pushed the lsyin/prefill-h2d-on-fwd-stream branch from c9d3cea to c9153da Compare May 22, 2026 07:35

Kangyan-Zhou temporarily deployed to prod May 22, 2026 07:49 — with GitHub Actions Inactive

hnyls2002 reopened this May 22, 2026

unify input_ids resolve across overlap/non-overlap

2be0b40

hnyls2002 force-pushed the lsyin/prefill-h2d-on-fwd-stream branch from b5a0eb9 to 2be0b40 Compare May 22, 2026 08:00

Kangyan-Zhou mentioned this pull request May 22, 2026

[CI] Drop unused 'environment: prod' from bot-cherry-pick job #26067

Merged

4 tasks

hnyls2002 added 2 commits May 22, 2026 01:29

input_ids relay through future_map for both overlap and non-overlap

ffde8f5

route hisparse rejoin and disagg PREBUILT non-spec through future_map

fc1f99a

hnyls2002 requested review from ByronHsu and ShangmingCai as code owners May 22, 2026 08:36

hnyls2002 added 8 commits May 22, 2026 03:18

fix spec_v1 non-overlap: stash dispatch by payload type; skip spec ex…

4da5f01

…tras gather

Merge remote-tracking branch 'origin/main' into lsyin/prefill-h2d-on-…

6851083

…fwd-stream # Conflicts: # python/sglang/srt/managers/overlap_utils.py # python/sglang/srt/managers/scheduler.py

revert spurious torch.full(-1) init; belongs to assert PR

81bcb54

non-overlap: drop unneeded future_indices capture; read req_pool_indi…

5841787

…ces at stash time

non-overlap spec_v1: keep batch.input_ids assignment; skip relay stas…

eb5ef1e

…h (shape mismatch)

fix: penalty reads input_ids=None (BUG-1); spec_v2 isolation re-insta…

85eb67b

…lls stale staging (BUG-3)

restructure: forward_stream_ctx wraps isolation; resolve_forward_inpu…

af933ba

…ts sits between

spec: read batch.device instead of batch.input_ids.device (input_ids …

b61e499

…may be None pre-worker)

hnyls2002 requested a review from Qiaolin-Yu as a code owner May 22, 2026 11:17

hnyls2002 added 3 commits May 22, 2026 13:20

init forward_stream_ctx unconditionally (PP non-overlap also uses it)

850c2cc

embedding/reward: also resolve_forward_inputs before forward

5180879

Merge remote-tracking branch 'origin/main' into lsyin/prefill-h2d-on-…

ad48a0d

…fwd-stream # Conflicts: # python/sglang/srt/managers/overlap_utils.py # python/sglang/srt/managers/schedule_batch.py

hnyls2002 removed the bypass-fastfail label May 28, 2026

Merge remote-tracking branch 'origin/main' into lsyin/prefill-h2d-on-…

b27ff67

…fwd-stream # Conflicts: # python/sglang/srt/managers/schedule_batch.py # python/sglang/srt/managers/scheduler.py

hnyls2002 added the bypass-fastfail label May 30, 2026

hnyls2002 and others added 7 commits May 29, 2026 19:23

Merge branch 'main' into lsyin/prefill-h2d-on-fwd-stream

3a978a3

overlap_utils: use envs for CI flag; _gpu naming; tidy docstring

49431a4

add fixme: unify relay path with dataclass

a980214

tiny fix

e4d5b4c

split flatten_arrays helper; pinned-cpu variant

d742e89

clarify input_ids relay comments; rename latest_output_ids

e9ba395

Merge branch 'main' into lsyin/prefill-h2d-on-fwd-stream

9f3512c

hnyls2002 changed the title ~~Defer prefill input_ids H2D to resolve_future~~ [Scheduler] Defer prefill input_ids H2D to forward stream, unify resolve via future_map May 30, 2026

hnyls2002 merged commit 282c461 into main May 30, 2026
71 of 147 checks passed

hnyls2002 deleted the lsyin/prefill-h2d-on-fwd-stream branch May 30, 2026 09:58

This was referenced May 30, 2026

[CI Monitor] Daily Report - 2026-05-30 bingxche/sglang-ci-bot#88

Open

[CI Monitor] Daily Report - 2026-05-31 bingxche/sglang-ci-bot#89

Open

[CI Monitor] Daily Report - 2026-06-01 bingxche/sglang-ci-bot#90

Open

ShangmingCai mentioned this pull request Jun 1, 2026

[PP][Bugfix] Handle input_ids assignment in prepare_for_extend #26883

Merged

5 tasks

polisettyvarma mentioned this pull request Jun 1, 2026

[Bug] [NPU] DeepseekV2ForCausalLM.forward crashes on PP non-first ranks with multimodal models (input_ids and input_embeds are both None) #26542

Open

5 tasks

amd-bot mentioned this pull request Jun 2, 2026

[CI Monitor] Daily Report - 2026-06-02 bingxche/sglang-ci-bot#91

Open

xjpang pushed a commit to xjpang/sglang that referenced this pull request Jun 2, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

9f85f9f

…lve via future_map (sgl-project#25945)

mqhc2020 pushed a commit to mqhc2020/sglang that referenced this pull request Jun 2, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

d13039e

…lve via future_map (sgl-project#25945)

hanming-lu pushed a commit that referenced this pull request Jun 3, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

73bc73c

…lve via future_map (#25945)

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

4c85d0c

…lve via future_map (sgl-project#25945)

jeynmann pushed a commit to jeynmann/sglang that referenced this pull request Jun 4, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

94a0d8b

…lve via future_map (sgl-project#25945)

edwingao28 pushed a commit to edwingao28/sglang that referenced this pull request Jun 7, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

77c2260

…lve via future_map (sgl-project#25945)

monkeyLoveding pushed a commit to monkeyLoveding/sglang_open that referenced this pull request Jun 9, 2026

[Scheduler] Defer prefill input_ids H2D to forward stream, unify reso…

b74696f

…lve via future_map (sgl-project#25945)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] Defer prefill input_ids H2D to forward stream, unify resolve via future_map#25945

[Scheduler] Defer prefill input_ids H2D to forward stream, unify resolve via future_map#25945
hnyls2002 merged 27 commits into
mainfrom
lsyin/prefill-h2d-on-fwd-stream

hnyls2002 commented May 21, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 21, 2026

Uh oh!

hnyls2002 commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hnyls2002 commented May 21, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Mechanism

Coverage (all forward modes routed through the relay)

Spec / non-overlap

Fixes

Cleanup

CI States

Uh oh!

gemini-code-assist Bot commented May 21, 2026

Uh oh!

hnyls2002 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hnyls2002 commented May 21, 2026 •

edited by github-actions Bot

Loading

hnyls2002 commented May 30, 2026 •

edited

Loading