Skip to content

research(memory): ReasoningBank — distill generalizable reasoning strategies from agent successes/failures (arXiv:2509.25140) #3312

@bug-ops

Description

@bug-ops

Description

ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At inference time, the agent retrieves relevant reasoning strategies from the bank and injects them as system context, improving performance on new tasks. The paper also introduces memory-aware test-time scaling (MaTTS) to generate diverse experience signals.

This complements Zeph's SKILL.md self-learning (which captures tool usage patterns) by capturing why a reasoning approach succeeded or failed — a different granularity of experience than skills or episodic memory.

Relevance to Zeph

  • Zeph's SKILL.md self-learning captures successful_patterns and failure_patterns at a coarse level
  • ReasoningBank adds a structured middle layer: distilled strategies with associated context fingerprints
  • Integration point: zeph-skills or a new zeph-reasoning sub-component storing strategy vectors in Qdrant
  • Retrieval at context-build time (similar to semantic memory recall) to inject strategy hints before LLM call

Key Mechanism

  1. Agent executes a task and self-judges success/failure
  2. Successful chains are distilled into reusable strategy summaries (not raw trajectories)
  3. Failed chains contribute contrastive signals that sharpen strategy boundaries
  4. At query time: embedding search → top-k strategy injection → LLM call
  5. MaTTS: more compute per task → more diverse experiences → richer memory bank

Results

Outperforms raw trajectory storage and success-only memory banks on web browsing (WebArena) and software engineering (SWE-bench) benchmarks.

Implementation Sketch

  • New ReasoningMemory struct in zeph-memory storing strategy summaries with task fingerprint embeddings
  • Self-judge step (fast model) after each completed turn: did the agent succeed? Extract reasoning chain.
  • Distillation step (mid-tier model): compress reasoning chain → generalizable strategy
  • Retrieval: top-3 strategies injected into context preamble via ContextBuilder
  • Config: [memory.reasoning] with extract_provider, distill_provider, store_limit

Estimated Complexity

Medium-high. Core read/write path: ~2 weeks. MaTTS scaling variant: additional sprint.

Source: https://arxiv.org/abs/2509.25140

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitymemoryzeph-memory crate (SQLite)researchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions