Skip to content

DO NOT MERGE - scripted runtime#26991

Closed
fzyzcjy wants to merge 4 commits into
mainfrom
tom/scripted_runtime_extracted_chain
Closed

DO NOT MERGE - scripted runtime#26991
fzyzcjy wants to merge 4 commits into
mainfrom
tom/scripted_runtime_extracted_chain

Conversation

@fzyzcjy

@fzyzcjy fzyzcjy commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

CI States

Latest PR Test (Base): ✅ Run #27017228482
Latest PR Test (Extra): ✅ Run #27017228308

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 2, 2026
@fzyzcjy

fzyzcjy commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@fzyzcjy

fzyzcjy commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Triaged the red checks: every failure is the pr-gate / *-finish cascade, not a real test failure — diagnosis and re-trigger plan below. Please push back if any conclusion is off.

Triage

All 13 red checks are gate/aggregator jobs (pr-gate, call-gate, pr-test-finish, pr-test-extra-finish, pr-test-amd-finish, pr-test-musa-finish). Root cause from the pr-gate log:

PR Labels: [documentation]
Require run-ci: true
Missing required label 'run-ci'. ... exit code 1

The run-ci / run-ci-extra / bypass-fastfail labels were added after the PR was opened, but the gate's if: reads labels from the opened event payload (which only had documentation). So the gate failed and the 78 real GPU test jobs were all skipped — no actual test ran yet.

Next step

Close + reopen the PR to fire a reopened event carrying the current labels, so the gate passes and the real CI suite runs. Will babysit the fresh run.

@fzyzcjy fzyzcjy closed this Jun 2, 2026
@fzyzcjy fzyzcjy reopened this Jun 2, 2026
@fzyzcjy

fzyzcjy commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Triaged the fresh CI round after the gate was unblocked; root-caused the CUDA-lane failures to two deterministic bugs (not flakes). Details + fingerprints below. Please push back if any conclusion is off.

CUDA-lane triage (fresh run, gate now green)

Two distinct root causes, both introduced by this PR, both deterministic:

1. Scripted-runtime tests time out — HTTP endpoint not ready when reset POSTs

Affected jobs / tests (all the same failure mode):

  • base-b-test-1-gpu-smalltest/registered/scripted_runtime/test_scripted_runtime_core.py::test_abort_all_finishes_running_req
  • extra-a-test-1-gpu-smalltest/registered/chunked_prefill/test_scripted_core_1gpu.py
  • extra-b-test-4-gpu-h100test/registered/chunked_prefill/test_scripted_core_4gpu.py::TestScriptedPpChunkSweep::test_pp_chunk_sweep

Fingerprint:

scripted_runtime: POST /abort_request failed
ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', <port>)
...
TimeoutError: ScriptedTokenizerRecvProxy: no AbortReq arrived on the recv_from_tokenizer socket within 60.0s
  File ".../scripted_runtime/scheduler_hook.py", line 54, in _reset_engine_state
  File ".../scripted_runtime/context/api.py", line 90, in abort_all
Subprocess scheduler_0 crashed with exit code 1 ... SIGQUIT
RuntimeError: server process died before responding to ... test_abort_all_finishes_running_req
→ test elapsed=1200 (hit the per-test timeout)

Root cause: in _reset_engine_state, the very first abort_all() POSTs to the server's own HTTP port, but the POST is refused — the uvicorn entrypoint is not accepting connections yet. The POST is fire-and-forget, so ScriptedTokenizerRecvProxy.wait_until_arrived then blocks 60s waiting for an AbortReq that was never delivered, the script reset raises, the hook sys.exit(1)s, the server dies, and the test hangs to its 1200s timeout. The harness appears to wait only for the IPC HookReady, not for the HTTP endpoint to be connectable — so it races on slower CI startup. Passes locally where uvicorn is up by the time the script runs.

2. Scheduler.pause_generation assert rejects a mode that main still uses

  • base-a-test-cputest/registered/unit/managers/test_scheduler_pause_generation.py
AssertionError: Scheduler.pause_generation got unsupported mode 'abort';
abort is handled in TokenizerManager and must not reach the scheduler

The assert added here claims abort never reaches Scheduler.pause_generation, but test_scheduler_pause_generation.py on main (unchanged by this PR) has test_abort_clears_state and test_abort_drains_overlap_queue, both of which call pause_generation(PauseGenerationReqInput(mode="abort")) and assert the scheduler-side behavior. So on current main, abort legitimately reaches the scheduler and the assert's premise is false. The companion refactor that would make the premise true (moving abort handling out of the scheduler + updating these tests) is not part of this PR's scope.

Not chasing (per lane policy / likely unrelated)

  • AMD (mi325 / mi35x), XPU, NPU, MUSA finish failures — non-CUDA lanes; will only revisit if shown to be caused by this PR.
  • extra-a-test-1-gpu-largeTestStreamingSessionEagle.* (streaming-session + EAGLE) — untouched by this PR; treating as unrelated/pre-existing pending a closer look.

Next step

Both CUDA failures are deterministic, so a rerun would not help — they need code changes. Holding for direction from the author since (2) questions the standalone correctness of the assert commit and (1) is a harness-readiness fix.

@fzyzcjy

fzyzcjy commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Pushed fixes for both deterministic CUDA failures; fresh CI is running (gate auto-passed on the synchronize event now that run-ci is present). Summary below.

Fixes pushed (a655011)

1. Scripted-runtime HTTP-readiness raceScriptedHttpServer.start now polls /health until 200 (server past Starting, uvicorn socket bound) after the HookReady handshake, before any script may run. Previously only HookReady (scheduler dispatch loop started) was awaited, so the first _reset_engine_state abort_all() POST could hit ConnectionRefused, the recv proxy then timed out after 60s, and the test hung to its hard timeout. Fixes test_scripted_runtime_core.py, test_scripted_core_1gpu.py, test_scripted_core_4gpu.py.

2. Scheduler.pause_generation assert — broadened to the real PauseGenerationReqInput.mode Literal domain ("abort", "retract", "in_place") and dropped the incorrect "abort must not reach the scheduler" claim. abort is in fact the default mode and is handled by the general path; test_scheduler_pause_generation.py (test_abort_clears_state, test_abort_drains_overlap_queue) relies on it. Fixes base-a-test-cpu.

Watching the new CUDA jobs (base-a-test-cpu, base-b-test-1-gpu-small, extra-a-test-1-gpu-small, extra-b-test-4-gpu-h100).

@fzyzcjy fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch 2 times, most recently from f43fe86 to 829e7d9 Compare June 2, 2026 06:44
@fzyzcjy fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch from 829e7d9 to 372ac76 Compare June 2, 2026 08:36
@fzyzcjy

fzyzcjy commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Readiness fix landed; reporting the current CUDA-lane status and one remaining real failure in a new test. Please push back if any conclusion is off.

CUDA status (tip 372ac767)

The readiness fix (wait for the HTTP socket to be bound — accept any response incl. 503 — instead of /health==200, which never happens in scripted mode because warmup cannot complete while the scheduler is script-driven) landed:

Test Result
base-a-test-cpu (pause_generation unit test)
base-b test_scripted_runtime_core.py
extra-b (4-gpu-h100) test_scripted_core_4gpu.py
extra-a (1-gpu) test_scripted_core_1gpu.py ❌ (real test bug, below)

Remaining real failure — test_scripted_core_1gpu.py

test_pause_retract_at_lifecycle_points_then_resume[stage=last_decode]:

File ".../test_scripted_core_1gpu.py", line 81, in _script_pause_retract_at_stage
    len(r.req.output_ids) == _LIFECYCLE_MAX_NEW_TOKENS
AttributeError: 'NoneType' object has no attribute 'output_ids'

At last_decode, pause_generation(retract) + one yield lets the req finish and be removed from all scheduler structures. r.finished stays True (tracked via _seen_rids), but r.req (find_req_by_rid) returns None, so r.req.output_ids raises. The assertion mixes a survives-removal signal (finished) with a gone-after-removal one (req). test_scripted_runtime_core.py passes because its stage/param mix never hits the finished-then-removed instant.

This needs a harness/test-semantics decision (a hook-derived output-token probe vs restructuring the assertion), so holding for the author rather than weakening the assertion.

Not chasing

H20 (base-c), extra-b-8-gpu-h200 (test_disaggregation_hybrid_attention.py, untouched by this PR), NPU/XPU/AMD lanes — unrelated to this PR.

@fzyzcjy fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch 3 times, most recently from 428763a to 3501f01 Compare June 5, 2026 02:55
@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Triaging this PR's CI on behalf of the PR owner; classification below is preliminary pending log confirmation. Please push back if any conclusion is off.

CI triage snapshot

Rollup: 34 SUCCESS / 23 SKIPPED / 89 running / 3 FAILURE. The CUDA hard-gate jobs are still running.

The 3 current failures are all non-CUDA lanes:

Job Workflow Run
stage-a-test-1-gpu-xpu PR Test (XPU) https://github.com/sgl-project/sglang/actions/runs/26992505286/job/79655371683
finish (cascade of the above) PR Test (XPU) https://github.com/sgl-project/sglang/actions/runs/26992505286/job/79655384150
stage-b-test-4-npu-a3 PR Test (NPU) https://github.com/sgl-project/sglang/actions/runs/26992505261/job/79655372409

Preliminary classification: likely unrelated to this PR. Reasoning from the diff:

  • All behavioral source changes are gated behind SGLANG_TEST_SCRIPTED_RUNTIME (default False); when the flag is off the new scheduler/IPC hooks are inert (scripted_scheduler_hook = None, guarded call sites).
  • The only unconditional source changes are safe: adding zmq.PAIR to a socket-config branch in utils/network.py, a new required enable_scripted_runtime kwarg on SchedulerIpcChannels.create (single caller, correctly passing it), and Union[...] type annotations (import present).
  • The new scripted_runtime / kv_canary test files are not registered into any run-suite, so they are not collected by the XPU/NPU suites.

Next step: REST API budget is currently exhausted (resets in ~45 min). Once it resets I will pull the XPU/NPU job logs to confirm the failure fingerprints, and continue watching the CUDA gate.

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Pulled the job logs for all current failures and confirmed each fingerprint; classification below is now log-backed. Please push back if any conclusion is off.

Non-CUDA failures confirmed unrelated to this PR

REST budget is back; I fetched the logs for all 5 current FAILURE checks. All are non-CUDA lanes (XPU + NPU) and none are caused by this PR's diff (all source changes are gated behind SGLANG_TEST_SCRIPTED_RUNTIME default-False; the new test files are not registered into any run-suite). The CUDA hard gate (PR Test Base / Extra) is still running with 0 failures.

Job Root-cause fingerprint Class
stage-a-test-1-gpu-xpu Fails at actions/checkout clean step, before any test: EACCES: permission denied, unlink '.../python/sglang.egg-info/PKG-INFO' — leftover file from a prior run the runner can't remove. Runner infra
finish (XPU) Cascade of the XPU checkout failure above. Cascade
stage-b-test-1-npu-a2 test/registered/ascend/basic_function/quant/test_npu_w8a8_quantization.py::test_gsm8kAssertionError: 632.60 not greater than or equal to 700 (output-throughput threshold). Server launched fine, accuracy 0.31 OK — just the NPU box ran the perf assertion below its hard-coded 700 tok/s bar. NPU perf flake
stage-b-test-4-npu-a3 test/registered/ascend/basic_function/HiCache/test_npu_hicache_mla.pyRuntimeError: GetDeviceMemories ... NPU function error: aclrtGetMemInfo(ACL_HBM_MEM, ...) error code 107002 / The context is empty / TsdOpen failed. devId=0. NPU device/driver init
pr-test-npu-finish Cascade of the two NPU stage failures above. Cascade

None of these touch the scheduler/IPC hook code this PR adds. No fix is warranted on the PR side. Per non-CUDA-lane policy I am leaving them as-is and continuing to watch the CUDA gate; will report any CUDA-lane failure separately.

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Addendum after a 6th failure appeared; I pulled its log and confirmed it is also unrelated. Please push back if any conclusion is off.

Addendum — 6th failure (AMD), also unrelated

Job Root-cause fingerprint Class
stage-b-test-large-8-gpu-mi35x-disaggregation-amd test/registered/amd/disaggregation/test_mori_transfer_engine_e2e.py::test_generate_smoke_hybrid_mamba → PD-disagg decode server never came up: mini_lb aiohttp ClientConnectorError: Cannot connect to host 127.0.0.1:11120, server returned 500, then retry() exceed maximum number of retriesAssertionError: 500 != 200. AMD PD-disagg server flake

Grepping the full AMD log, the only occurrence of scripted_runtime is in the runner's branch-fetch listing — no server-side traceback references any file this PR touches (scheduler.py, request_receiver.py, ipc_channels.py, tokenizer_recv_proxy.py). The MoRI transfer-engine server crash is on the AMD mi35x disaggregation lane and is independent of this PR's env-gated test-infra changes.

Status: all 6 current failures are non-CUDA-lane and confirmed unrelated to this PR. The CUDA hard gate (PR Test Base / Extra) is still running with 0 failures (~22 jobs left). Continuing to watch; will report any CUDA-lane failure separately.

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. A CUDA-lane job failed; I pulled the log, found a hang/timeout in an unrelated HiCache test, and am triggering a single-file rerun to classify flake vs. real. Please push back if any conclusion is off.

CUDA-lane failure: extra-b-test-4-gpu-h100 — HiCache test timeout

What happened (from the log): the server (Qwen3.5-27B hybrid-mamba, enable_hierarchical_cache=True) came up fine and the first gsm8k eval passed cleanly (Score: 0.980, 1456 tok/s). After a successful POST /flush_cache → 200 OK at 05:17:06, the second eval started, ran two decode batches at 05:17:08–09, then went silent for ~16 minutes until the watchdog/timeout killed it at 05:33:04. Classic hang after cache flush+reload, not a crash (no CUDA error / traceback).

Relation to this PR: none that I can find. The server server_args show kv_canary='none' and SGLANG_TEST_SCRIPTED_RUNTIME is unset, so every code path this PR touches (the env-gated scheduler/IPC hooks, kv_canary helper) is inert. Grepping the full job log, no file this PR modifies appears anywhere in the server stderr. HiCache is not touched by this PR.

Next step: classifying as a probable HiCache hang/flake. Triggering a single-file rerun below; will report the verdict. The remaining ~13 CUDA jobs are still running with no other CUDA failures so far.

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/hicache/test_qwen35_hicache.py

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Results for /rerun-test test/registered/hicache/test_qwen35_hicache.py:

🚀 4-gpu-h100 (1 test): ❌ View workflow run

cd test/ && python3 registered/hicache/test_qwen35_hicache.py

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Found a cross-branch data point proving the HiCache CUDA timeout is pre-existing on this PR's base commit, independent of the PR. Please push back if any conclusion is off.

test_qwen35_hicache.py timeout is pre-existing on main — not this PR

There is already a rerun-test of the exact same file on main from earlier today, and it failed the same way:

Branch / commit Run Same file? Outcome
PR #26991extra-b-test-4-gpu-h100 job 79655383617 FAIL✗ TIMEOUT after 1200s (hang after flush_cache, first eval scored 0.980)
main @ c6c1f1a29a run 26992947929 FAIL — same shape: first eval Score: 0.980, flush_cache → 200 OK, then server hangs and is force-killed (Terminate orphan process: sglang/python)

c6c1f1a29a is the exact base commit this PR is stacked on — i.e. the test hangs on the base tree without any of this PR's 4 commits applied. Combined with the earlier code-level analysis (every path this PR touches is gated behind SGLANG_TEST_SCRIPTED_RUNTIME / kv_canary, both off here; HiCache is untouched), this is conclusive: the HiCache timeout is a pre-existing main flake/hang, not introduced by this PR.

I had also kicked off a /rerun-test of this file on the PR branch (run 26997570249, still in progress) — its outcome doesn't change the verdict, but I'll note it once it finishes.

Overall status: all 8 current failures (6 non-CUDA infra/flake + the HiCache CUDA timeout + its pr-test-extra-finish cascade) are confirmed unrelated to this PR. No PR-side fix is warranted. 8 CUDA jobs still running with no other CUDA failures; continuing to watch.

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Explaining the re-run of the HiCache CUDA job and the cross-branch evidence, in response to a question about job 79678030535. Please push back if any conclusion is off.

What job 79678030535 is, and whether main has the same problem

The job: it is the auto-retry (attempt 2) of extra-b-test-4-gpu-h100 (1) — CI's fast-fail/health system re-ran the failed CUDA stage. It runs test/registered/hicache/test_qwen35_hicache.py (Qwen3.5-27B hybrid-mamba + hierarchical cache).

Failure mode (identical on both attempts):

  • Server starts fine; the first gsm8k eval passes cleanly (Score: 0.980).
  • POST /flush_cache → 200 OK.
  • The second eval runs two decode batches, then goes silent for ~16 min until the watchdog fires ✗ TIMEOUT after 1200s and force-kills the server (Terminate orphan process: sglang/python).
  • No CUDA error / no traceback → this is a hang/deadlock, not a crash. The config uses --hicache-storage-prefetch-policy wait_complete, a plausible place to deadlock.

Because the auto-retry hung the same way, it is a consistent hang, not a one-off flake.

Does main have the same problem? Yes — and it is independent of this PR:

Branch / commit Run Same file Outcome
this PR — extra-b-test-4-gpu-h100 (1), both attempts attempt 1 · attempt 2 FAIL — TIMEOUT 1200s
main @ c6c1f1a29a (= this PR's base commit) run 26992947929 FAIL — same hang-after-flush_cache shape

c6c1f1a29a is the exact commit this PR is stacked on, with none of this PR's 4 commits applied, and the test already hangs there. On top of that, the server server_args for the failing run show kv_canary='none' and SGLANG_TEST_SCRIPTED_RUNTIME unset, so every code path this PR touches (the env-gated scheduler/IPC hooks, the kv_canary helper) is inert; HiCache is not touched by this PR at all.

Conclusion: the HiCache CUDA timeout is a pre-existing, consistent hang on main (most likely a wait_complete prefetch deadlock), not introduced by this PR. No fix belongs in this PR's commits.

Root cause found — fixed by #27366

This is not something to fix in this PR. The hang is a known main bug in the HiMamba HiCache prefetch path, and a fix is already up:

  • #27366 — "[BugFix]: Fix HiMamba HiCache prefetch hang after L3 sidecar transfer" (author @hzh0425)
  • Its stated motivation is literally "Fix qwen3.5 hicache hanging", referencing a main PR Test Base failure (run 26985767961) — another main-side hit of this same hang.
  • It touches exactly one file, python/sglang/srt/mem_cache/hi_mamba_radix_cache.py — the HiMambaRadixCache path this test exercises — adding a guard in can_terminate_prefetch so a prefetch is not terminated while operation.pool_transfers are still in flight (not operation.pool_transfers_done). That matches the symptom exactly: --hicache-storage-prefetch-policy wait_complete waiting forever on an L3 sidecar pool transfer that was allowed to "complete" early.

So the verdict stands and is now root-caused: the test_qwen35_hicache.py CUDA timeout is a pre-existing main HiMamba-prefetch hang, unrelated to this PR, and it will be resolved by #27366 (not by anything in this PR's commits). Once #27366 lands and this PR is rebased, the test should go green.

@fzyzcjy fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch from 3501f01 to f958fe8 Compare June 5, 2026 13:19
@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Confirming the HiCache fix works on this PR and kicking a retry of the remaining non-CUDA infra/flake failures. Please push back if any conclusion is off.

✅ HiCache timeout resolved by #27366 — confirmed on this PR

After rebasing onto main @ faa6286946 (which includes #27366 "[BugFix]: Fix HiMamba HiCache prefetch hang after L3 sidecar transfer"), extra-b-test-4-gpu-h100 (1) — the exact shard that hung twice before — now passes. The previous verdict is confirmed end-to-end: the hang was the pre-existing main HiMamba-prefetch bug, fixed by #27366.

Current round status: 124 SUCCESS / 2 still queued (base-c-test-4-gpu-h100 (2), (3) — waiting on the congested 4-GPU H100 pool) / 0 CUDA failures. XPU passed this round too (last round's leftover-file checkout issue cleared).

The only remaining failures are the chronic non-CUDA lanes (NPU stage-b-test-1-npu-a2 + AMD stage-b-test-1-gpu-small-amd (6) / mi35x-disaggregation / stage-c-large-8-gpu (3) + their finish cascades) — same infra/flake patterns as triaged earlier, unrelated to this PR. Triggering a retry of just the failed jobs below.

@fzyzcjy

fzyzcjy commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@fzyzcjy

fzyzcjy commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Posted autonomously by Claude Code acting on the user's behalf. Final status report for this CI round; retrying the remaining non-CUDA failures one more time. Please push back if any conclusion is off.

✅ CUDA gate fully green — final round status

The round on the rebased head (f958fe841c, base faa6286946 incl. #27366) has completed:

Remaining 5 failures are all chronic non-CUDA lanes (3 real + 2 finish cascades), unrelated to this PR:

Job Status
NPU stage-b-test-1-npu-a2 (0) failed again after one retry (Ascend lane health)
AMD stage-b-test-large-8-gpu-mi35x-disaggregation failed again after one retry (MoRI PD-disagg flake)
AMD stage-c-test-large-8-gpu-amd (3) failed this round
pr-test-npu-finish / pr-test-amd-finish cascades of the above

(The earlier /rerun-failed-ci retry did clear AMD stage-b-test-1-gpu-small-amd (6).) Giving these one more retry below; if they stay red it is lane health, not this PR.

@fzyzcjy

fzyzcjy commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@fzyzcjy

fzyzcjy commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@fzyzcjy

fzyzcjy commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

ci

image

@fzyzcjy fzyzcjy closed this Jun 6, 2026
@fzyzcjy

fzyzcjy commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

merged

fzyzcjy added a commit to fzyzcjy/sglang that referenced this pull request Jun 6, 2026
…ain) into tom/scripted_runtime_and_chunked_testing

19 files conflicted (mostly add/add: both branches independently grew the
scripted-runtime + kv_canary test infra). This branch is the developed superset;
sgl-project#26991 is the extracted chain. Resolution:

- 16 files -> took ours (this branch is a strict superset; sgl-project#26991 only stripped
  comments/code ours has). Includes req_handle.py: kept ours' SWA-aware,
  None-safe lock_refs; sgl-project#26991 still had the old node.lock_ref that crashes on
  SWA nodes / finished reqs.
- http_server.py -> took sgl-project#26991's _await_http_ready() (waits for the uvicorn port
  to bind before scripts POST control messages -- a real startup-race fix ours
  lacked) and re-inserted ours' overlap-scheduler MUST-run comment.
- scripted_runtime_chunked_helpers.py -> kept ours' comments; took sgl-project#26991's
  advance_to_nth_chunk using accumulated chunks_done instead of sampling
  is_chunking (which undercounts on the step the req leaves chunked_req).
- test/registered/.../test_scripted_core_1gpu.py -> kept ours' comments and the
  finished-case output-length assertion, but made it r.req-None-safe per sgl-project#26991's
  note that a finished req may already be removed (output_ids unobservable).

Verified: no conflict markers, all resolved files compile (py3.12+), no unmerged
files. merge_diff_check.py reported 24 'lost' files -- all false positives (ours
== theirs for those, so they auto-merged identically; confirmed present in the
merged tree, e.g. environ.py keeps all 4 SCRIPTED_RUNTIME env vars).
fzyzcjy added a commit to fzyzcjy/sglang that referenced this pull request Jun 6, 2026
…sting

Upstream landed the extracted scripted-runtime chain (sgl-project#27410-sgl-project#27413), which
conflicts with this branch's further-developed copies of the same files. All 23
conflicts resolve to ours: theirs' only unique content is older forms this
branch has since corrected on H200 (pre-SWA lock_refs, short-circuit
run_until_all_finished, the finished-case-dropping core_1gpu variant, the
narrower abort() signature, pre-stop_token_ids imports); rerere replayed the
matching resolutions from the earlier sgl-project#26991 merge for the rest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bypass-fastfail documentation Improvements or additions to documentation run-ci run-ci-extra

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant