research(memory): OmniMem autoresearch-guided discovery of lifelong multimodal agent memory — +411% F1 on LoCoMo

## Description

arXiv:2604.01007 (April 1, 2026) — *Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory* by Liu et al. (UNC-Chapel Hill, UC Berkeley, UC Santa Cruz, Cisco).

OmniMem proposes an autonomous research pipeline that systematically discovers and validates memory architecture improvements through self-experimentation — running ~50 experiments across two benchmarks to diagnose failure modes and propose architectural modifications.

Code: https://github.com/aiming-lab/OmniMem

## Key Results

- LoCoMo benchmark: F₁ improved from 0.117 → 0.598 (+411%)
- Mem-Gallery benchmark: F₁ improved from 0.254 → 0.797 (+214%)

## What the Autoresearch Pipeline Discovered

Top gains by category:
- **Bug fixes**: +175% — the autoresearch pipeline finds and patches its own memory bugs
- **Prompt engineering**: +188% on specific categories
- **Architectural changes**: +44%

Crucially, architecture changes from self-experimentation outperformed all hyperparameter tuning combined.

## Relevance to Zeph

Zeph's self-learning system (`zeph-skills`, `ReasoningMemory`, `SkilHeuristics`) already accumulates outcomes and extracts heuristics. OmniMem extends this concept to the memory architecture itself:

1. **Memory architecture self-improvement**: Rather than only tuning skill strategies, the pipeline diagnoses *why* memory retrieval fails and proposes architectural patches
2. **Multimodal memory**: OmniMem handles image + text memories; Zeph is currently text-only. This opens a path to image/attachment memory.
3. **Autoresearch evaluation loop**: The pipeline runs automated evaluations (LoCoMo-style) without human oversight — a natural fit for Zeph's CI cycle

## Proposed Design Direction

- Extend the `skill_outcomes` / `reasoning_strategies` self-learning loop to also log memory retrieval failures (no-hit turns, low-confidence recalls)
- Add a periodic background task (via `zeph-scheduler`) that runs a self-evaluation micro-benchmark on recent memory retrievals
- Use retrieval failure analysis to tune SYNAPSE spreading activation parameters (decay factor, depth, MMR threshold)
- Track memory architecture improvement suggestions in `skill_heuristics` table, distinct from skill heuristics

## References

- Paper: https://arxiv.org/abs/2604.01007
- Code: https://github.com/aiming-lab/OmniMem
- Related: #3312 (ReasoningBank), #3564 (MemCoT), #3222 (MemReader), APEX-MEM spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(memory): OmniMem autoresearch-guided discovery of lifelong multimodal agent memory — +411% F1 on LoCoMo #3566

Description

Key Results

What the Autoresearch Pipeline Discovered

Relevance to Zeph

Proposed Design Direction

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(memory): OmniMem autoresearch-guided discovery of lifelong multimodal agent memory — +411% F1 on LoCoMo #3566

Description

Description

Key Results

What the Autoresearch Pipeline Discovered

Relevance to Zeph

Proposed Design Direction

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions