research(memory): OCR-Memory — visual trajectory encoding for scalable long-horizon agent memory without lossy summarization

## Description

**OCR-Memory** (arXiv:2604.26622, April 29, 2026) proposes a memory framework that encodes historical agent trajectories as **images** rather than text, using the visual modality as a high-density representation of agent experience.

## Key Technical Approach

Current agent memory systems face a fundamental constraint: token budgets. Storing raw trajectories is prohibitively expensive; summarization loses information; text-only retrieval returns fragmented evidence.

OCR-Memory addresses this by:
1. **Rendering** historical trajectories (tool calls, observations, reasoning chains) into annotated images with unique visual identifiers
2. **Retrieval** via a **locate-and-transcribe** paradigm: visual anchors select relevant image regions; retrieval becomes explicit index selection rather than free-form generation
3. **Adaptive resolution** and active-recall up-sampling: look far with manageable token cost while preserving high fidelity for salient memories

Key property: encoding into visual tokens avoids the trade-off between memory capacity and completeness — arbitrarily long histories can be stored without lossy summarization or truncation.

## Relevance to Zeph

Zeph's current memory pipeline:
- Short-term: sliding context window (lost on compact)
- Long-term: MAGMA graph (structured entities/relations), SYNAPSE spreading activation
- Episodic: scene storage (SQLite)
- Compaction: microcompact + hard compact (lossy summarization)

OCR-Memory is not a replacement but a **retrieval complement**: visual trajectory snapshots could serve as a dense, lossless episodic store for complex multi-step tool workflows (e.g., a 40-turn code refactor session), where text summarization loses critical intermediate state.

**Potential integration point:** `zeph-memory` episodic store — for sessions exceeding a token threshold, render the trajectory to a compact visual artifact (JSON → canvas → PNG) and store as a Qdrant vector point with the image embedding. Retrieval identifies the relevant session snapshot; transcription extracts the needed context region.

**Practical concerns:** Requires a VLM for retrieval transcription; storage size per session; rendering infrastructure. Likely P3 research until Zeph has a VLM integration path.

## References
- OCR-Memory: https://arxiv.org/abs/2604.26622

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(memory): OCR-Memory — visual trajectory encoding for scalable long-horizon agent memory without lossy summarization #3571

Description

Key Technical Approach

Relevance to Zeph

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(memory): OCR-Memory — visual trajectory encoding for scalable long-horizon agent memory without lossy summarization #3571

Description

Description

Key Technical Approach

Relevance to Zeph

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions