Problem
POST /restore_snapshot with create_new_request=True and continuation_ids hangs indefinitely. The HTTP future in tokenizer_manager.restore_snapshot blocks on snapshot_restore_result_queue.get() and never resolves. Response always returns rid=null, output_text=null.
Affects all models. Step 5 (stateful recall) is BLOCKED in every compat protocol run.
Root Cause
The working implementation existed on the A100 (Phase 8: 4/4 PASS, commit 3917c1231 in the runpod backup). It had two components:
1. scheduler.py handle_restore_snapshot — when continuation_ids is present:
- Appends continuation tokens to
origin_input_ids
- Sets
new_req._stateful_generate = True
- Applies
recv_req.max_new_tokens to SamplingParams
- Returns
None (deferred — generation completes async)
2. scheduler_output_processor_mixin.py — on request finish:
- Detects
req._stateful_generate == True
- Sends
RestoreSnapshotReqOutput(success=True, rid=..., output_ids=[...]) via send_to_tokenizer
- Tokenizer manager detokenizes
output_ids → output_text and returns to HTTP caller
During the upstream merge (PR #15/#16), Phase 8 was reconstructed from a lost A100 session. The mixin (part 2) was ported correctly — scheduler_output_processor_mixin.py:1023 has the _stateful_generate check. But scheduler.py (part 1) was reconstructed without the continuation_ids path, so _stateful_generate is never set to True, the mixin's output routing is dead code, and the queue never unblocks.
Evidence
- Working code source:
/home/jeanclawdai/runpod-backup/restore/repo at commit 857dd02a6 (latest Phase 8 branch tip)
- Current
scheduler.py: grep continuation_ids → no results
- Current mixin
scheduler_output_processor_mixin.py:1023: _stateful_generate check present but never triggered
test/registered/radix_cache/test_mamba_stateful_inference.py — all 4 tests hang at restore_snapshot call
- Confirmed broken across: granite-tiny, granite-small, Nemotron-Cascade-2-30B, Qwen3-Coder-Next
Fix
Restore the create_new_request=True path in handle_restore_snapshot (scheduler.py):
- Read
recv_req.continuation_ids, append to origin_input_ids when present
- Apply
recv_req.max_new_tokens to SamplingParams if provided
- Set
new_req._stateful_generate = stateful_generate (bool flag)
- Return
None when stateful_generate=True (deferred), RestoreSnapshotReqOutput otherwise
Also verify RestoreSnapshotReqInput (in io_struct.py) has continuation_ids and max_new_tokens fields.
Verification
pytest test/registered/radix_cache/test_mamba_stateful_inference.py -v
# Expected: 4/4 PASS on granite-4.0-h-tiny
# --enable-snapshot-persistence --mamba-scheduler-strategy no_buffer
Problem
POST /restore_snapshotwithcreate_new_request=Trueandcontinuation_idshangs indefinitely. The HTTP future intokenizer_manager.restore_snapshotblocks onsnapshot_restore_result_queue.get()and never resolves. Response always returnsrid=null, output_text=null.Affects all models. Step 5 (stateful recall) is BLOCKED in every compat protocol run.
Root Cause
The working implementation existed on the A100 (Phase 8: 4/4 PASS, commit
3917c1231in the runpod backup). It had two components:1.
scheduler.pyhandle_restore_snapshot— whencontinuation_idsis present:origin_input_idsnew_req._stateful_generate = Truerecv_req.max_new_tokenstoSamplingParamsNone(deferred — generation completes async)2.
scheduler_output_processor_mixin.py— on request finish:req._stateful_generate == TrueRestoreSnapshotReqOutput(success=True, rid=..., output_ids=[...])viasend_to_tokenizeroutput_ids→output_textand returns to HTTP callerDuring the upstream merge (PR #15/#16), Phase 8 was reconstructed from a lost A100 session. The mixin (part 2) was ported correctly —
scheduler_output_processor_mixin.py:1023has the_stateful_generatecheck. Butscheduler.py(part 1) was reconstructed without thecontinuation_idspath, so_stateful_generateis never set toTrue, the mixin's output routing is dead code, and the queue never unblocks.Evidence
/home/jeanclawdai/runpod-backup/restore/repoat commit857dd02a6(latest Phase 8 branch tip)scheduler.py:grep continuation_ids→ no resultsscheduler_output_processor_mixin.py:1023:_stateful_generatecheck present but never triggeredtest/registered/radix_cache/test_mamba_stateful_inference.py— all 4 tests hang atrestore_snapshotcallFix
Restore the
create_new_request=Truepath inhandle_restore_snapshot(scheduler.py):recv_req.continuation_ids, append toorigin_input_idswhen presentrecv_req.max_new_tokenstoSamplingParamsif providednew_req._stateful_generate = stateful_generate(bool flag)Nonewhenstateful_generate=True(deferred),RestoreSnapshotReqOutputotherwiseAlso verify
RestoreSnapshotReqInput(inio_struct.py) hascontinuation_idsandmax_new_tokensfields.Verification