Description
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At inference time, the agent retrieves relevant reasoning strategies from the bank and injects them as system context, improving performance on new tasks. The paper also introduces memory-aware test-time scaling (MaTTS) to generate diverse experience signals.
This complements Zeph's SKILL.md self-learning (which captures tool usage patterns) by capturing why a reasoning approach succeeded or failed — a different granularity of experience than skills or episodic memory.
Relevance to Zeph
- Zeph's SKILL.md self-learning captures
successful_patterns and failure_patterns at a coarse level
- ReasoningBank adds a structured middle layer: distilled strategies with associated context fingerprints
- Integration point:
zeph-skills or a new zeph-reasoning sub-component storing strategy vectors in Qdrant
- Retrieval at context-build time (similar to semantic memory recall) to inject strategy hints before LLM call
Key Mechanism
- Agent executes a task and self-judges success/failure
- Successful chains are distilled into reusable strategy summaries (not raw trajectories)
- Failed chains contribute contrastive signals that sharpen strategy boundaries
- At query time: embedding search → top-k strategy injection → LLM call
- MaTTS: more compute per task → more diverse experiences → richer memory bank
Results
Outperforms raw trajectory storage and success-only memory banks on web browsing (WebArena) and software engineering (SWE-bench) benchmarks.
Implementation Sketch
- New
ReasoningMemory struct in zeph-memory storing strategy summaries with task fingerprint embeddings
- Self-judge step (fast model) after each completed turn: did the agent succeed? Extract reasoning chain.
- Distillation step (mid-tier model): compress reasoning chain → generalizable strategy
- Retrieval: top-3 strategies injected into context preamble via
ContextBuilder
- Config:
[memory.reasoning] with extract_provider, distill_provider, store_limit
Estimated Complexity
Medium-high. Core read/write path: ~2 weeks. MaTTS scaling variant: additional sprint.
Source: https://arxiv.org/abs/2509.25140
Description
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At inference time, the agent retrieves relevant reasoning strategies from the bank and injects them as system context, improving performance on new tasks. The paper also introduces memory-aware test-time scaling (MaTTS) to generate diverse experience signals.
This complements Zeph's SKILL.md self-learning (which captures tool usage patterns) by capturing why a reasoning approach succeeded or failed — a different granularity of experience than skills or episodic memory.
Relevance to Zeph
successful_patternsandfailure_patternsat a coarse levelzeph-skillsor a newzeph-reasoningsub-component storing strategy vectors in QdrantKey Mechanism
Results
Outperforms raw trajectory storage and success-only memory banks on web browsing (WebArena) and software engineering (SWE-bench) benchmarks.
Implementation Sketch
ReasoningMemorystruct inzeph-memorystoring strategy summaries with task fingerprint embeddingsContextBuilder[memory.reasoning]withextract_provider,distill_provider,store_limitEstimated Complexity
Medium-high. Core read/write path: ~2 weeks. MaTTS scaling variant: additional sprint.
Source: https://arxiv.org/abs/2509.25140