Skip to content

research(memory): OmniMem autoresearch-guided discovery of lifelong multimodal agent memory — +411% F1 on LoCoMo #3566

@bug-ops

Description

@bug-ops

Description

arXiv:2604.01007 (April 1, 2026) — Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory by Liu et al. (UNC-Chapel Hill, UC Berkeley, UC Santa Cruz, Cisco).

OmniMem proposes an autonomous research pipeline that systematically discovers and validates memory architecture improvements through self-experimentation — running ~50 experiments across two benchmarks to diagnose failure modes and propose architectural modifications.

Code: https://github.com/aiming-lab/OmniMem

Key Results

  • LoCoMo benchmark: F₁ improved from 0.117 → 0.598 (+411%)
  • Mem-Gallery benchmark: F₁ improved from 0.254 → 0.797 (+214%)

What the Autoresearch Pipeline Discovered

Top gains by category:

  • Bug fixes: +175% — the autoresearch pipeline finds and patches its own memory bugs
  • Prompt engineering: +188% on specific categories
  • Architectural changes: +44%

Crucially, architecture changes from self-experimentation outperformed all hyperparameter tuning combined.

Relevance to Zeph

Zeph's self-learning system (zeph-skills, ReasoningMemory, SkilHeuristics) already accumulates outcomes and extracts heuristics. OmniMem extends this concept to the memory architecture itself:

  1. Memory architecture self-improvement: Rather than only tuning skill strategies, the pipeline diagnoses why memory retrieval fails and proposes architectural patches
  2. Multimodal memory: OmniMem handles image + text memories; Zeph is currently text-only. This opens a path to image/attachment memory.
  3. Autoresearch evaluation loop: The pipeline runs automated evaluations (LoCoMo-style) without human oversight — a natural fit for Zeph's CI cycle

Proposed Design Direction

  • Extend the skill_outcomes / reasoning_strategies self-learning loop to also log memory retrieval failures (no-hit turns, low-confidence recalls)
  • Add a periodic background task (via zeph-scheduler) that runs a self-evaluation micro-benchmark on recent memory retrievals
  • Use retrieval failure analysis to tune SYNAPSE spreading activation parameters (decay factor, depth, MMR threshold)
  • Track memory architecture improvement suggestions in skill_heuristics table, distinct from skill heuristics

References

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitymemoryzeph-memory crate (SQLite)researchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions