Description
SpeculationEngine is now wired into the agent loop (#3636, PR #3640): the engine is instantiated and end_turn() is called correctly. However, try_dispatch is never called — the engine holds a running sweeper but no speculative tasks are ever submitted.
The missing piece is hooking try_dispatch into the LLM SSE streaming path so that as the decoder emits partial JSON tool-call tokens, the engine can fire speculative pre-execution.
Scope
- File:
crates/zeph-core/src/agent/ — the streaming consumption path where ToolStream SSE events arrive and partial JSON is assembled into ToolCall structs
PartialJsonParser in crates/zeph-core/src/agent/speculative/ is already implemented and awaiting integration
- When a partial tool call becomes sufficiently confident (
confidence >= config.confidence_threshold), call engine.try_dispatch(prediction, TrustLevel::Trusted) with a timeout per await-discipline rules
Expected Behavior
With [tools.speculative] mode = "decoding" set, speculative tool calls should fire during SSE streaming so the result is ready (or discarded) by the time the LLM finishes decoding.
Depends on
#3636 (merged in PR #3640)
Related
Description
SpeculationEngineis now wired into the agent loop (#3636, PR #3640): the engine is instantiated andend_turn()is called correctly. However,try_dispatchis never called — the engine holds a running sweeper but no speculative tasks are ever submitted.The missing piece is hooking
try_dispatchinto the LLM SSE streaming path so that as the decoder emits partial JSON tool-call tokens, the engine can fire speculative pre-execution.Scope
crates/zeph-core/src/agent/— the streaming consumption path whereToolStreamSSE events arrive and partial JSON is assembled intoToolCallstructsPartialJsonParserincrates/zeph-core/src/agent/speculative/is already implemented and awaiting integrationconfidence >= config.confidence_threshold), callengine.try_dispatch(prediction, TrustLevel::Trusted)with a timeout per await-discipline rulesExpected Behavior
With
[tools.speculative] mode = "decoding"set, speculative tool calls should fire during SSE streaming so the result is ready (or discarded) by the time the LLM finishes decoding.Depends on
#3636 (merged in PR #3640)
Related