research(memory): ReasoningBank — distill generalizable reasoning strategies from agent successes/failures (arXiv:2509.25140)

## Description

ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At inference time, the agent retrieves relevant reasoning strategies from the bank and injects them as system context, improving performance on new tasks. The paper also introduces memory-aware test-time scaling (MaTTS) to generate diverse experience signals.

This complements Zeph's SKILL.md self-learning (which captures tool usage patterns) by capturing **why** a reasoning approach succeeded or failed — a different granularity of experience than skills or episodic memory.

## Relevance to Zeph

- Zeph's SKILL.md self-learning captures `successful_patterns` and `failure_patterns` at a coarse level
- ReasoningBank adds a structured middle layer: distilled strategies with associated context fingerprints
- Integration point: `zeph-skills` or a new `zeph-reasoning` sub-component storing strategy vectors in Qdrant
- Retrieval at context-build time (similar to semantic memory recall) to inject strategy hints before LLM call

## Key Mechanism

1. Agent executes a task and self-judges success/failure
2. Successful chains are distilled into reusable strategy summaries (not raw trajectories)
3. Failed chains contribute contrastive signals that sharpen strategy boundaries
4. At query time: embedding search → top-k strategy injection → LLM call
5. MaTTS: more compute per task → more diverse experiences → richer memory bank

## Results

Outperforms raw trajectory storage and success-only memory banks on web browsing (WebArena) and software engineering (SWE-bench) benchmarks.

## Implementation Sketch

- New `ReasoningMemory` struct in `zeph-memory` storing strategy summaries with task fingerprint embeddings
- Self-judge step (fast model) after each completed turn: did the agent succeed? Extract reasoning chain.
- Distillation step (mid-tier model): compress reasoning chain → generalizable strategy
- Retrieval: top-3 strategies injected into context preamble via `ContextBuilder`
- Config: `[memory.reasoning]` with `extract_provider`, `distill_provider`, `store_limit`

## Estimated Complexity

Medium-high. Core read/write path: ~2 weeks. MaTTS scaling variant: additional sprint.

Source: https://arxiv.org/abs/2509.25140

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(memory): ReasoningBank — distill generalizable reasoning strategies from agent successes/failures (arXiv:2509.25140) #3312

Description

Relevance to Zeph

Key Mechanism

Results

Implementation Sketch

Estimated Complexity

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(memory): ReasoningBank — distill generalizable reasoning strategies from agent successes/failures (arXiv:2509.25140) #3312

Description

Description

Relevance to Zeph

Key Mechanism

Results

Implementation Sketch

Estimated Complexity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions