Description
OCR-Memory (arXiv:2604.26622, April 29, 2026) proposes a memory framework that encodes historical agent trajectories as images rather than text, using the visual modality as a high-density representation of agent experience.
Key Technical Approach
Current agent memory systems face a fundamental constraint: token budgets. Storing raw trajectories is prohibitively expensive; summarization loses information; text-only retrieval returns fragmented evidence.
OCR-Memory addresses this by:
- Rendering historical trajectories (tool calls, observations, reasoning chains) into annotated images with unique visual identifiers
- Retrieval via a locate-and-transcribe paradigm: visual anchors select relevant image regions; retrieval becomes explicit index selection rather than free-form generation
- Adaptive resolution and active-recall up-sampling: look far with manageable token cost while preserving high fidelity for salient memories
Key property: encoding into visual tokens avoids the trade-off between memory capacity and completeness — arbitrarily long histories can be stored without lossy summarization or truncation.
Relevance to Zeph
Zeph's current memory pipeline:
- Short-term: sliding context window (lost on compact)
- Long-term: MAGMA graph (structured entities/relations), SYNAPSE spreading activation
- Episodic: scene storage (SQLite)
- Compaction: microcompact + hard compact (lossy summarization)
OCR-Memory is not a replacement but a retrieval complement: visual trajectory snapshots could serve as a dense, lossless episodic store for complex multi-step tool workflows (e.g., a 40-turn code refactor session), where text summarization loses critical intermediate state.
Potential integration point: zeph-memory episodic store — for sessions exceeding a token threshold, render the trajectory to a compact visual artifact (JSON → canvas → PNG) and store as a Qdrant vector point with the image embedding. Retrieval identifies the relevant session snapshot; transcription extracts the needed context region.
Practical concerns: Requires a VLM for retrieval transcription; storage size per session; rendering infrastructure. Likely P3 research until Zeph has a VLM integration path.
References
Description
OCR-Memory (arXiv:2604.26622, April 29, 2026) proposes a memory framework that encodes historical agent trajectories as images rather than text, using the visual modality as a high-density representation of agent experience.
Key Technical Approach
Current agent memory systems face a fundamental constraint: token budgets. Storing raw trajectories is prohibitively expensive; summarization loses information; text-only retrieval returns fragmented evidence.
OCR-Memory addresses this by:
Key property: encoding into visual tokens avoids the trade-off between memory capacity and completeness — arbitrarily long histories can be stored without lossy summarization or truncation.
Relevance to Zeph
Zeph's current memory pipeline:
OCR-Memory is not a replacement but a retrieval complement: visual trajectory snapshots could serve as a dense, lossless episodic store for complex multi-step tool workflows (e.g., a 40-turn code refactor session), where text summarization loses critical intermediate state.
Potential integration point:
zeph-memoryepisodic store — for sessions exceeding a token threshold, render the trajectory to a compact visual artifact (JSON → canvas → PNG) and store as a Qdrant vector point with the image embedding. Retrieval identifies the relevant session snapshot; transcription extracts the needed context region.Practical concerns: Requires a VLM for retrieval transcription; storage size per session; rendering infrastructure. Likely P3 research until Zeph has a VLM integration path.
References