DO NOT MERGE - scripted runtime by fzyzcjy · Pull Request #26991 · sgl-project/sglang

fzyzcjy · 2026-06-02T02:13:54Z

CI States

Latest PR Test (Base): ✅ Run #27017228482
Latest PR Test (Extra): ✅ Run #27017228308

gemini-code-assist · 2026-06-02T02:13:58Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

fzyzcjy · 2026-06-02T02:17:26Z

/tag-and-rerun-ci

fzyzcjy · 2026-06-02T02:18:47Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Triaged the red checks: every failure is the pr-gate / *-finish cascade, not a real test failure — diagnosis and re-trigger plan below. Please push back if any conclusion is off.

Triage

All 13 red checks are gate/aggregator jobs (pr-gate, call-gate, pr-test-finish, pr-test-extra-finish, pr-test-amd-finish, pr-test-musa-finish). Root cause from the pr-gate log:

PR Labels: [documentation]
Require run-ci: true
Missing required label 'run-ci'. ... exit code 1

The run-ci / run-ci-extra / bypass-fastfail labels were added after the PR was opened, but the gate's if: reads labels from the opened event payload (which only had documentation). So the gate failed and the 78 real GPU test jobs were all skipped — no actual test ran yet.

Next step

Close + reopen the PR to fire a reopened event carrying the current labels, so the gate passes and the real CI suite runs. Will babysit the fresh run.

fzyzcjy · 2026-06-02T04:28:15Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Triaged the fresh CI round after the gate was unblocked; root-caused the CUDA-lane failures to two deterministic bugs (not flakes). Details + fingerprints below. Please push back if any conclusion is off.

CUDA-lane triage (fresh run, gate now green)

Two distinct root causes, both introduced by this PR, both deterministic:

1. Scripted-runtime tests time out — HTTP endpoint not ready when reset POSTs

Affected jobs / tests (all the same failure mode):

base-b-test-1-gpu-small → test/registered/scripted_runtime/test_scripted_runtime_core.py::test_abort_all_finishes_running_req
extra-a-test-1-gpu-small → test/registered/chunked_prefill/test_scripted_core_1gpu.py
extra-b-test-4-gpu-h100 → test/registered/chunked_prefill/test_scripted_core_4gpu.py::TestScriptedPpChunkSweep::test_pp_chunk_sweep

Fingerprint:

scripted_runtime: POST /abort_request failed
ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', <port>)
...
TimeoutError: ScriptedTokenizerRecvProxy: no AbortReq arrived on the recv_from_tokenizer socket within 60.0s
  File ".../scripted_runtime/scheduler_hook.py", line 54, in _reset_engine_state
  File ".../scripted_runtime/context/api.py", line 90, in abort_all
Subprocess scheduler_0 crashed with exit code 1 ... SIGQUIT
RuntimeError: server process died before responding to ... test_abort_all_finishes_running_req
→ test elapsed=1200 (hit the per-test timeout)

Root cause: in _reset_engine_state, the very first abort_all() POSTs to the server's own HTTP port, but the POST is refused — the uvicorn entrypoint is not accepting connections yet. The POST is fire-and-forget, so ScriptedTokenizerRecvProxy.wait_until_arrived then blocks 60s waiting for an AbortReq that was never delivered, the script reset raises, the hook sys.exit(1)s, the server dies, and the test hangs to its 1200s timeout. The harness appears to wait only for the IPC HookReady, not for the HTTP endpoint to be connectable — so it races on slower CI startup. Passes locally where uvicorn is up by the time the script runs.

2. `Scheduler.pause_generation` assert rejects a mode that main still uses

base-a-test-cpu → test/registered/unit/managers/test_scheduler_pause_generation.py

AssertionError: Scheduler.pause_generation got unsupported mode 'abort';
abort is handled in TokenizerManager and must not reach the scheduler

The assert added here claims abort never reaches Scheduler.pause_generation, but test_scheduler_pause_generation.py on main (unchanged by this PR) has test_abort_clears_state and test_abort_drains_overlap_queue, both of which call pause_generation(PauseGenerationReqInput(mode="abort")) and assert the scheduler-side behavior. So on current main, abort legitimately reaches the scheduler and the assert's premise is false. The companion refactor that would make the premise true (moving abort handling out of the scheduler + updating these tests) is not part of this PR's scope.

Not chasing (per lane policy / likely unrelated)

AMD (mi325 / mi35x), XPU, NPU, MUSA finish failures — non-CUDA lanes; will only revisit if shown to be caused by this PR.
extra-a-test-1-gpu-large → TestStreamingSessionEagle.* (streaming-session + EAGLE) — untouched by this PR; treating as unrelated/pre-existing pending a closer look.

Next step

Both CUDA failures are deterministic, so a rerun would not help — they need code changes. Holding for direction from the author since (2) questions the standalone correctness of the assert commit and (1) is a harness-readiness fix.

fzyzcjy · 2026-06-02T05:05:59Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Pushed fixes for both deterministic CUDA failures; fresh CI is running (gate auto-passed on the synchronize event now that run-ci is present). Summary below.

Fixes pushed (`a655011`)

1. Scripted-runtime HTTP-readiness race — ScriptedHttpServer.start now polls /health until 200 (server past Starting, uvicorn socket bound) after the HookReady handshake, before any script may run. Previously only HookReady (scheduler dispatch loop started) was awaited, so the first _reset_engine_state abort_all() POST could hit ConnectionRefused, the recv proxy then timed out after 60s, and the test hung to its hard timeout. Fixes test_scripted_runtime_core.py, test_scripted_core_1gpu.py, test_scripted_core_4gpu.py.

2. Scheduler.pause_generation assert — broadened to the real PauseGenerationReqInput.mode Literal domain ("abort", "retract", "in_place") and dropped the incorrect "abort must not reach the scheduler" claim. abort is in fact the default mode and is handled by the general path; test_scheduler_pause_generation.py (test_abort_clears_state, test_abort_drains_overlap_queue) relies on it. Fixes base-a-test-cpu.

Watching the new CUDA jobs (base-a-test-cpu, base-b-test-1-gpu-small, extra-a-test-1-gpu-small, extra-b-test-4-gpu-h100).

fzyzcjy · 2026-06-02T11:06:06Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Readiness fix landed; reporting the current CUDA-lane status and one remaining real failure in a new test. Please push back if any conclusion is off.

CUDA status (tip `372ac767`)

The readiness fix (wait for the HTTP socket to be bound — accept any response incl. 503 — instead of /health==200, which never happens in scripted mode because warmup cannot complete while the scheduler is script-driven) landed:

Test	Result
`base-a-test-cpu` (pause_generation unit test)	✅
`base-b` `test_scripted_runtime_core.py`	✅
`extra-b` (4-gpu-h100) `test_scripted_core_4gpu.py`	✅
`extra-a` (1-gpu) `test_scripted_core_1gpu.py`	❌ (real test bug, below)

Remaining real failure — `test_scripted_core_1gpu.py`

test_pause_retract_at_lifecycle_points_then_resume[stage=last_decode]:

File ".../test_scripted_core_1gpu.py", line 81, in _script_pause_retract_at_stage
    len(r.req.output_ids) == _LIFECYCLE_MAX_NEW_TOKENS
AttributeError: 'NoneType' object has no attribute 'output_ids'

At last_decode, pause_generation(retract) + one yield lets the req finish and be removed from all scheduler structures. r.finished stays True (tracked via _seen_rids), but r.req (find_req_by_rid) returns None, so r.req.output_ids raises. The assertion mixes a survives-removal signal (finished) with a gone-after-removal one (req). test_scripted_runtime_core.py passes because its stage/param mix never hits the finished-then-removed instant.

This needs a harness/test-semantics decision (a hook-derived output-token probe vs restructuring the assertion), so holding for the author rather than weakening the assertion.

Not chasing

H20 (base-c), extra-b-8-gpu-h200 (test_disaggregation_hybrid_attention.py, untouched by this PR), NPU/XPU/AMD lanes — unrelated to this PR.

fzyzcjy · 2026-06-05T03:13:05Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Triaging this PR's CI on behalf of the PR owner; classification below is preliminary pending log confirmation. Please push back if any conclusion is off.

CI triage snapshot

Rollup: 34 SUCCESS / 23 SKIPPED / 89 running / 3 FAILURE. The CUDA hard-gate jobs are still running.

The 3 current failures are all non-CUDA lanes:

Job	Workflow	Run
`stage-a-test-1-gpu-xpu`	PR Test (XPU)	https://github.com/sgl-project/sglang/actions/runs/26992505286/job/79655371683
`finish` (cascade of the above)	PR Test (XPU)	https://github.com/sgl-project/sglang/actions/runs/26992505286/job/79655384150
`stage-b-test-4-npu-a3`	PR Test (NPU)	https://github.com/sgl-project/sglang/actions/runs/26992505261/job/79655372409

Preliminary classification: likely unrelated to this PR. Reasoning from the diff:

All behavioral source changes are gated behind SGLANG_TEST_SCRIPTED_RUNTIME (default False); when the flag is off the new scheduler/IPC hooks are inert (scripted_scheduler_hook = None, guarded call sites).
The only unconditional source changes are safe: adding zmq.PAIR to a socket-config branch in utils/network.py, a new required enable_scripted_runtime kwarg on SchedulerIpcChannels.create (single caller, correctly passing it), and Union[...] type annotations (import present).
The new scripted_runtime / kv_canary test files are not registered into any run-suite, so they are not collected by the XPU/NPU suites.

Next step: REST API budget is currently exhausted (resets in ~45 min). Once it resets I will pull the XPU/NPU job logs to confirm the failure fingerprints, and continue watching the CUDA gate.

fzyzcjy · 2026-06-05T03:59:52Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Pulled the job logs for all current failures and confirmed each fingerprint; classification below is now log-backed. Please push back if any conclusion is off.

Non-CUDA failures confirmed unrelated to this PR

REST budget is back; I fetched the logs for all 5 current FAILURE checks. All are non-CUDA lanes (XPU + NPU) and none are caused by this PR's diff (all source changes are gated behind SGLANG_TEST_SCRIPTED_RUNTIME default-False; the new test files are not registered into any run-suite). The CUDA hard gate (PR Test Base / Extra) is still running with 0 failures.

Job	Root-cause fingerprint	Class
`stage-a-test-1-gpu-xpu`	Fails at `actions/checkout` clean step, before any test: `EACCES: permission denied, unlink '.../python/sglang.egg-info/PKG-INFO'` — leftover file from a prior run the runner can't remove.	Runner infra
`finish` (XPU)	Cascade of the XPU checkout failure above.	Cascade
`stage-b-test-1-npu-a2`	`test/registered/ascend/basic_function/quant/test_npu_w8a8_quantization.py::test_gsm8k` → `AssertionError: 632.60 not greater than or equal to 700` (output-throughput threshold). Server launched fine, accuracy 0.31 OK — just the NPU box ran the perf assertion below its hard-coded 700 tok/s bar.	NPU perf flake
`stage-b-test-4-npu-a3`	`test/registered/ascend/basic_function/HiCache/test_npu_hicache_mla.py` → `RuntimeError: GetDeviceMemories ... NPU function error: aclrtGetMemInfo(ACL_HBM_MEM, ...) error code 107002` / `The context is empty` / `TsdOpen failed. devId=0`.	NPU device/driver init
`pr-test-npu-finish`	Cascade of the two NPU stage failures above.	Cascade

None of these touch the scheduler/IPC hook code this PR adds. No fix is warranted on the PR side. Per non-CUDA-lane policy I am leaving them as-is and continuing to watch the CUDA gate; will report any CUDA-lane failure separately.

fzyzcjy · 2026-06-05T04:50:43Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Addendum after a 6th failure appeared; I pulled its log and confirmed it is also unrelated. Please push back if any conclusion is off.

Addendum — 6th failure (AMD), also unrelated

Job	Root-cause fingerprint	Class
`stage-b-test-large-8-gpu-mi35x-disaggregation-amd`	`test/registered/amd/disaggregation/test_mori_transfer_engine_e2e.py::test_generate_smoke_hybrid_mamba` → PD-disagg decode server never came up: mini_lb `aiohttp ClientConnectorError: Cannot connect to host 127.0.0.1:11120`, server returned `500`, then `retry() exceed maximum number of retries` → `AssertionError: 500 != 200`.	AMD PD-disagg server flake

Grepping the full AMD log, the only occurrence of scripted_runtime is in the runner's branch-fetch listing — no server-side traceback references any file this PR touches (scheduler.py, request_receiver.py, ipc_channels.py, tokenizer_recv_proxy.py). The MoRI transfer-engine server crash is on the AMD mi35x disaggregation lane and is independent of this PR's env-gated test-infra changes.

Status: all 6 current failures are non-CUDA-lane and confirmed unrelated to this PR. The CUDA hard gate (PR Test Base / Extra) is still running with 0 failures (~22 jobs left). Continuing to watch; will report any CUDA-lane failure separately.

fzyzcjy · 2026-06-05T05:36:16Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. A CUDA-lane job failed; I pulled the log, found a hang/timeout in an unrelated HiCache test, and am triggering a single-file rerun to classify flake vs. real. Please push back if any conclusion is off.

CUDA-lane failure: `extra-b-test-4-gpu-h100` — HiCache test timeout

Job: https://github.com/sgl-project/sglang/actions/runs/26992505263/job/79655383617 (PR Test Extra, 4× H100)
Failing file: test/registered/hicache/test_qwen35_hicache.py — ✗ TIMEOUT after 1200s (Test Summary: 0/4 passed); pr-test-extra-finish is its cascade.

What happened (from the log): the server (Qwen3.5-27B hybrid-mamba, enable_hierarchical_cache=True) came up fine and the first gsm8k eval passed cleanly (Score: 0.980, 1456 tok/s). After a successful POST /flush_cache → 200 OK at 05:17:06, the second eval started, ran two decode batches at 05:17:08–09, then went silent for ~16 minutes until the watchdog/timeout killed it at 05:33:04. Classic hang after cache flush+reload, not a crash (no CUDA error / traceback).

Relation to this PR: none that I can find. The server server_args show kv_canary='none' and SGLANG_TEST_SCRIPTED_RUNTIME is unset, so every code path this PR touches (the env-gated scheduler/IPC hooks, kv_canary helper) is inert. Grepping the full job log, no file this PR modifies appears anywhere in the server stderr. HiCache is not touched by this PR.

Next step: classifying as a probable HiCache hang/flake. Triggering a single-file rerun below; will report the verdict. The remaining ~13 CUDA jobs are still running with no other CUDA failures so far.

fzyzcjy · 2026-06-05T05:36:23Z

/rerun-test test/registered/hicache/test_qwen35_hicache.py

github-actions · 2026-06-05T05:36:48Z

Results for /rerun-test test/registered/hicache/test_qwen35_hicache.py:

🚀 4-gpu-h100 (1 test): ❌ View workflow run

cd test/ && python3 registered/hicache/test_qwen35_hicache.py

fzyzcjy · 2026-06-05T06:36:19Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Found a cross-branch data point proving the HiCache CUDA timeout is pre-existing on this PR's base commit, independent of the PR. Please push back if any conclusion is off.

`test_qwen35_hicache.py` timeout is pre-existing on `main` — not this PR

There is already a rerun-test of the exact same file on main from earlier today, and it failed the same way:

Branch / commit	Run	Same file?	Outcome
PR #26991 — `extra-b-test-4-gpu-h100`	job 79655383617	✓	FAIL — `✗ TIMEOUT after 1200s` (hang after `flush_cache`, first eval scored 0.980)
`main` @ `c6c1f1a29a`	run 26992947929	✓	FAIL — same shape: first eval `Score: 0.980`, `flush_cache → 200 OK`, then server hangs and is force-killed (`Terminate orphan process: sglang/python`)

c6c1f1a29a is the exact base commit this PR is stacked on — i.e. the test hangs on the base tree without any of this PR's 4 commits applied. Combined with the earlier code-level analysis (every path this PR touches is gated behind SGLANG_TEST_SCRIPTED_RUNTIME / kv_canary, both off here; HiCache is untouched), this is conclusive: the HiCache timeout is a pre-existing main flake/hang, not introduced by this PR.

I had also kicked off a /rerun-test of this file on the PR branch (run 26997570249, still in progress) — its outcome doesn't change the verdict, but I'll note it once it finishes.

Overall status: all 8 current failures (6 non-CUDA infra/flake + the HiCache CUDA timeout + its pr-test-extra-finish cascade) are confirmed unrelated to this PR. No PR-side fix is warranted. 8 CUDA jobs still running with no other CUDA failures; continuing to watch.

fzyzcjy · 2026-06-05T08:22:26Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Explaining the re-run of the HiCache CUDA job and the cross-branch evidence, in response to a question about job 79678030535. Please push back if any conclusion is off.

What job `79678030535` is, and whether `main` has the same problem

The job: it is the auto-retry (attempt 2) of extra-b-test-4-gpu-h100 (1) — CI's fast-fail/health system re-ran the failed CUDA stage. It runs test/registered/hicache/test_qwen35_hicache.py (Qwen3.5-27B hybrid-mamba + hierarchical cache).

Failure mode (identical on both attempts):

Server starts fine; the first gsm8k eval passes cleanly (Score: 0.980).
POST /flush_cache → 200 OK.
The second eval runs two decode batches, then goes silent for ~16 min until the watchdog fires ✗ TIMEOUT after 1200s and force-kills the server (Terminate orphan process: sglang/python).
No CUDA error / no traceback → this is a hang/deadlock, not a crash. The config uses --hicache-storage-prefetch-policy wait_complete, a plausible place to deadlock.

Because the auto-retry hung the same way, it is a consistent hang, not a one-off flake.

Does main have the same problem? Yes — and it is independent of this PR:

Branch / commit	Run	Same file	Outcome
this PR — `extra-b-test-4-gpu-h100 (1)`, both attempts	attempt 1 · attempt 2	✓	FAIL — TIMEOUT 1200s
`main` @ `c6c1f1a29a` (= this PR's base commit)	run 26992947929	✓	FAIL — same hang-after-`flush_cache` shape

c6c1f1a29a is the exact commit this PR is stacked on, with none of this PR's 4 commits applied, and the test already hangs there. On top of that, the server server_args for the failing run show kv_canary='none' and SGLANG_TEST_SCRIPTED_RUNTIME unset, so every code path this PR touches (the env-gated scheduler/IPC hooks, the kv_canary helper) is inert; HiCache is not touched by this PR at all.

Conclusion: the HiCache CUDA timeout is a pre-existing, consistent hang on main (most likely a wait_complete prefetch deadlock), not introduced by this PR. No fix belongs in this PR's commits.

Root cause found — fixed by #27366

This is not something to fix in this PR. The hang is a known main bug in the HiMamba HiCache prefetch path, and a fix is already up:

#27366 — "[BugFix]: Fix HiMamba HiCache prefetch hang after L3 sidecar transfer" (author @hzh0425)
Its stated motivation is literally "Fix qwen3.5 hicache hanging", referencing a main PR Test Base failure (run 26985767961) — another main-side hit of this same hang.
It touches exactly one file, python/sglang/srt/mem_cache/hi_mamba_radix_cache.py — the HiMambaRadixCache path this test exercises — adding a guard in can_terminate_prefetch so a prefetch is not terminated while operation.pool_transfers are still in flight (not operation.pool_transfers_done). That matches the symptom exactly: --hicache-storage-prefetch-policy wait_complete waiting forever on an L3 sidecar pool transfer that was allowed to "complete" early.

So the verdict stands and is now root-caused: the test_qwen35_hicache.py CUDA timeout is a pre-existing main HiMamba-prefetch hang, unrelated to this PR, and it will be resolved by #27366 (not by anything in this PR's commits). Once #27366 lands and this PR is rebased, the test should go green.

fzyzcjy · 2026-06-05T20:57:26Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Confirming the HiCache fix works on this PR and kicking a retry of the remaining non-CUDA infra/flake failures. Please push back if any conclusion is off.

✅ HiCache timeout resolved by #27366 — confirmed on this PR

After rebasing onto main @ faa6286946 (which includes #27366 "[BugFix]: Fix HiMamba HiCache prefetch hang after L3 sidecar transfer"), extra-b-test-4-gpu-h100 (1) — the exact shard that hung twice before — now passes. The previous verdict is confirmed end-to-end: the hang was the pre-existing main HiMamba-prefetch bug, fixed by #27366.

Current round status: 124 SUCCESS / 2 still queued (base-c-test-4-gpu-h100 (2), (3) — waiting on the congested 4-GPU H100 pool) / 0 CUDA failures. XPU passed this round too (last round's leftover-file checkout issue cleared).

The only remaining failures are the chronic non-CUDA lanes (NPU stage-b-test-1-npu-a2 + AMD stage-b-test-1-gpu-small-amd (6) / mi35x-disaggregation / stage-c-large-8-gpu (3) + their finish cascades) — same infra/flake patterns as triaged earlier, unrelated to this PR. Triggering a retry of just the failed jobs below.

fzyzcjy · 2026-06-05T20:57:36Z

/rerun-failed-ci

fzyzcjy · 2026-06-06T00:14:12Z

🤖 Posted autonomously by Claude Code acting on the user's behalf. Final status report for this CI round; retrying the remaining non-CUDA failures one more time. Please push back if any conclusion is off.

✅ CUDA gate fully green — final round status

The round on the rebased head (f958fe841c, base faa6286946 incl. #27366) has completed:

129 SUCCESS / 22 SKIPPED / 5 FAILURE — 0 CUDA failures.
PR Test Base + PR Test Extra (the CUDA hard gate): 100% green, including extra-b-test-4-gpu-h100 (1) (the previously-hanging HiCache shard — [BugFix]: Fix HiMamba HiCache prefetch hang after L3 sidecar transfer #27366 confirmed effective) and the full base-c H100/H200/B200/DeepEP set.
XPU, Xeon, Arm64: green.

Remaining 5 failures are all chronic non-CUDA lanes (3 real + 2 finish cascades), unrelated to this PR:

Job	Status
NPU `stage-b-test-1-npu-a2 (0)`	failed again after one retry (Ascend lane health)
AMD `stage-b-test-large-8-gpu-mi35x-disaggregation`	failed again after one retry (MoRI PD-disagg flake)
AMD `stage-c-test-large-8-gpu-amd (3)`	failed this round
`pr-test-npu-finish` / `pr-test-amd-finish`	cascades of the above

(The earlier /rerun-failed-ci retry did clear AMD stage-b-test-1-gpu-small-amd (6).) Giving these one more retry below; if they stay red it is lane health, not this PR.

fzyzcjy · 2026-06-06T00:14:15Z

/rerun-failed-ci

fzyzcjy · 2026-06-06T00:56:58Z

/rerun-failed-ci

fzyzcjy · 2026-06-06T01:01:06Z

ci

fzyzcjy · 2026-06-06T01:09:21Z

merged

…ain) into tom/scripted_runtime_and_chunked_testing 19 files conflicted (mostly add/add: both branches independently grew the scripted-runtime + kv_canary test infra). This branch is the developed superset; sgl-project#26991 is the extracted chain. Resolution: - 16 files -> took ours (this branch is a strict superset; sgl-project#26991 only stripped comments/code ours has). Includes req_handle.py: kept ours' SWA-aware, None-safe lock_refs; sgl-project#26991 still had the old node.lock_ref that crashes on SWA nodes / finished reqs. - http_server.py -> took sgl-project#26991's _await_http_ready() (waits for the uvicorn port to bind before scripts POST control messages -- a real startup-race fix ours lacked) and re-inserted ours' overlap-scheduler MUST-run comment. - scripted_runtime_chunked_helpers.py -> kept ours' comments; took sgl-project#26991's advance_to_nth_chunk using accumulated chunks_done instead of sampling is_chunking (which undercounts on the step the req leaves chunked_req). - test/registered/.../test_scripted_core_1gpu.py -> kept ours' comments and the finished-case output-length assertion, but made it r.req-None-safe per sgl-project#26991's note that a finished req may already be removed (output_ids unobservable). Verified: no conflict markers, all resolved files compile (py3.12+), no unmerged files. merge_diff_check.py reported 24 'lost' files -- all false positives (ours == theirs for those, so they auto-merged identically; confirmed present in the merged tree, e.g. environ.py keeps all 4 SCRIPTED_RUNTIME env vars).

…sting Upstream landed the extracted scripted-runtime chain (sgl-project#27410-sgl-project#27413), which conflicts with this branch's further-developed copies of the same files. All 23 conflicts resolve to ours: theirs' only unique content is older forms this branch has since corrected on H200 (pre-SWA lock_refs, short-circuit run_until_all_finished, the finished-case-dropping core_1gpu variant, the narrower abort() signature, pre-stop_token_ids imports); rerere replayed the matching resolutions from the earlier sgl-project#26991 merge for the rest.

fzyzcjy requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners June 2, 2026 02:13

github-actions Bot added the documentation Improvements or additions to documentation label Jun 2, 2026

fzyzcjy added run-ci bypass-fastfail run-ci-extra labels Jun 2, 2026

fzyzcjy closed this Jun 2, 2026

fzyzcjy reopened this Jun 2, 2026

fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch 2 times, most recently from f43fe86 to 829e7d9 Compare June 2, 2026 06:44

sglang-bot mentioned this pull request Jun 2, 2026

CUDA Coredump Tracker #26340

Open

fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch from 829e7d9 to 372ac76 Compare June 2, 2026 08:36

fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch 3 times, most recently from 428763a to 3501f01 Compare June 5, 2026 02:55

fzyzcjy added 4 commits June 5, 2026 21:19

Add kv_canary PP self-test fixture and SWA divergence coverage

c3636b6

Add scripted-runtime harness core and wire scheduler/IPC hooks

4da5163

Add scripted-runtime KV-pool and lock-ref exhauster primitives

4861c62

Add scripted-runtime unit, core integration, and chunked-prefill tests

f958fe8

fzyzcjy force-pushed the tom/scripted_runtime_extracted_chain branch from 3501f01 to f958fe8 Compare June 5, 2026 13:19

amd-bot mentioned this pull request Jun 6, 2026

[CI Monitor] Daily Report - 2026-06-06 bingxche/sglang-ci-bot#95

Open

fzyzcjy closed this Jun 6, 2026

fzyzcjy mentioned this pull request Jun 6, 2026

Sandbox: verify full main CI is green on latest main (do not merge) #25647

Closed

4 tasks

Conversation

fzyzcjy commented Jun 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI States

Uh oh!

gemini-code-assist Bot commented Jun 2, 2026

Uh oh!

fzyzcjy commented Jun 2, 2026

Uh oh!

fzyzcjy commented Jun 2, 2026

Triage

Next step

Uh oh!

fzyzcjy commented Jun 2, 2026

CUDA-lane triage (fresh run, gate now green)

1. Scripted-runtime tests time out — HTTP endpoint not ready when reset POSTs

2. Scheduler.pause_generation assert rejects a mode that main still uses

Not chasing (per lane policy / likely unrelated)

Next step

Uh oh!

fzyzcjy commented Jun 2, 2026

Fixes pushed (a655011)

Uh oh!

fzyzcjy commented Jun 2, 2026

CUDA status (tip 372ac767)

Remaining real failure — test_scripted_core_1gpu.py

Not chasing

Uh oh!

fzyzcjy commented Jun 5, 2026

CI triage snapshot

Uh oh!

fzyzcjy commented Jun 5, 2026

Non-CUDA failures confirmed unrelated to this PR

Uh oh!

fzyzcjy commented Jun 5, 2026

Addendum — 6th failure (AMD), also unrelated

Uh oh!

fzyzcjy commented Jun 5, 2026

CUDA-lane failure: extra-b-test-4-gpu-h100 — HiCache test timeout

Uh oh!

fzyzcjy commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fzyzcjy commented Jun 5, 2026

test_qwen35_hicache.py timeout is pre-existing on main — not this PR

Uh oh!

fzyzcjy commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What job 79678030535 is, and whether main has the same problem

Root cause found — fixed by #27366

Uh oh!

fzyzcjy commented Jun 5, 2026

✅ HiCache timeout resolved by #27366 — confirmed on this PR

Uh oh!

fzyzcjy commented Jun 5, 2026

Uh oh!

fzyzcjy commented Jun 6, 2026

✅ CUDA gate fully green — final round status

Uh oh!

fzyzcjy commented Jun 6, 2026

Uh oh!

fzyzcjy commented Jun 6, 2026

Uh oh!

fzyzcjy commented Jun 6, 2026

Uh oh!

fzyzcjy commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fzyzcjy commented Jun 2, 2026 •

edited by github-actions Bot

Loading

2. `Scheduler.pause_generation` assert rejects a mode that main still uses

Fixes pushed (`a655011`)

CUDA status (tip `372ac767`)

Remaining real failure — `test_scripted_core_1gpu.py`

CUDA-lane failure: `extra-b-test-4-gpu-h100` — HiCache test timeout

github-actions Bot commented Jun 5, 2026 •

edited

Loading

`test_qwen35_hicache.py` timeout is pre-existing on `main` — not this PR

fzyzcjy commented Jun 5, 2026 •

edited

Loading

What job `79678030535` is, and whether `main` has the same problem

fzyzcjy commented Jun 6, 2026 •

edited

Loading