Source
arXiv:2603.18897 — "Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution" (PASTE, submitted March 19, 2026)
Key Contribution
PASTE identifies that agent tool-call sequences have stable application-level control flows. Speculatively pre-executes predicted tool sequences based on learned call patterns before the LLM finishes reasoning, hiding execution latency. Achieves 48.5% reduction in task completion time and 1.8x tool throughput improvement.
Distinction from #2290
Existing issue #2290 (speculative tool calls) covers pre-fetching based on LLM draft tokens at the decoding level. PASTE is a distinct approach: it learns recurring application-level call sequences (e.g., "search always precedes summarize in research tasks") as patterns. Different mechanism, different implementation point, concrete throughput numbers worth tracking separately.
Relevance to Zeph
zeph-tools / zeph-core agent loop — ToolOrchestrator could learn per-skill tool sequence patterns (e.g., web_search → read → summarize for research skills). Pre-execute first N predicted tools while LLM generates the response selecting them.
Implementation Sketch
- Track per-skill tool invocation sequences in SQLite (extend scheduler/memory tables)
- On skill activation, predict top-K likely tool calls; pre-warm them speculatively
- Cancel if LLM selects a different tool path; commit if prediction matches
Priority Assessment
P3 (research) — High impact if tool call latency becomes user-visible bottleneck. Distinct from #2290; implement independently.
Source
arXiv:2603.18897 — "Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution" (PASTE, submitted March 19, 2026)
Key Contribution
PASTE identifies that agent tool-call sequences have stable application-level control flows. Speculatively pre-executes predicted tool sequences based on learned call patterns before the LLM finishes reasoning, hiding execution latency. Achieves 48.5% reduction in task completion time and 1.8x tool throughput improvement.
Distinction from #2290
Existing issue #2290 (speculative tool calls) covers pre-fetching based on LLM draft tokens at the decoding level. PASTE is a distinct approach: it learns recurring application-level call sequences (e.g., "search always precedes summarize in research tasks") as patterns. Different mechanism, different implementation point, concrete throughput numbers worth tracking separately.
Relevance to Zeph
zeph-tools/zeph-coreagent loop —ToolOrchestratorcould learn per-skill tool sequence patterns (e.g.,web_search → read → summarizefor research skills). Pre-execute first N predicted tools while LLM generates the response selecting them.Implementation Sketch
Priority Assessment
P3 (research) — High impact if tool call latency becomes user-visible bottleneck. Distinct from #2290; implement independently.