feat(session): add memory cold-storage archival via hotness scoring by mvanhorn · Pull Request #620 · volcengine/OpenViking

mvanhorn · 2026-03-15T04:20:27Z

Summary

Add a MemoryArchiver that moves cold memories (below a configurable hotness threshold) to an archive directory, reducing token consumption from stale abstracts and overviews during retrieval. Non-destructive: archived memories can be restored.

Problem Statement

The hotness_score() function in memory_lifecycle.py and active_count tracking in session.commit() are already in place, but memories never get cleaned up. Over time, stale memories accumulate, wasting tokens when parent directories regenerate abstracts/overviews.

Evidence

Source	Evidence	Engagement
#269	"How to delete individual long-term memories?"	@qin-ctx acknowledged gap
#578	Abstracts longer than source, 150k tokens per overview regen	1 thumbs up, 3 comments
#350	Token waste from accumulated content during ingestion	3 thumbs up
#296	Retrieval info management - added `active_count` (this PR builds on it)	closed/implemented

Proposed Solution

MemoryArchiver provides three operations:

scan(scope_uri) - Queries the vector index for all L2 memories under a scope, computes hotness_score() for each, returns those below the threshold and older than min_age_days.
archive(candidates) - Moves cold memories to {parent}/_archive/ via viking_fs.mv(), which atomically updates the vector index. Supports dry_run=True.
restore(archived_uri) - Moves an archived memory back to its original location by removing the _archive/ path segment.

Key design decisions:

Non-destructive - Files are moved, not deleted. Follows the filesystem paradigm.
L0/L1 protection - Abstracts and overviews are never archived (only L2 content).
Min age guard - Recent memories (default: <7 days) are always kept, regardless of score.
Scope-aware - Scanning is limited to the requested URI prefix.

Changes

New: openviking/session/memory_archiver.py (328 lines) - MemoryArchiver class with scan/archive/restore
New: tests/unit/session/test_memory_archiver.py (423 lines) - 30 unit tests
Modified: openviking/session/__init__.py - Export MemoryArchiver, ArchivalCandidate, ArchivalResult

Testing

All 30 unit tests pass. Tests cover:

Scan: cold detection, recent-memory skip, already-archived skip, out-of-scope skip, hot-memory retention, sort order, empty store
Archive: file movement, dry-run, error handling, empty candidates
Restore: round-trip, non-archived URI rejection, error handling
Convenience: scan_and_archive with and without dry-run

======================== 30 passed, 1 warning in 0.03s =========================

Future Work

This PR provides the core archival mechanism. Follow-up work could include:

CLI command ov memory archive / ov memory restore
Optional post-commit hook in session.commit() for auto-archival
Config schema additions for threshold/min_age defaults

The MemoryArchiver builds on hotness_score() in memory_lifecycle.py and the active_count tracking wired into session.commit(). It uses viking_fs.mv() to stay within the filesystem paradigm rather than introducing a new deletion mechanism.

This contribution was developed with AI assistance (Claude Code).

Relates to #269, #578, #350

Add MemoryArchiver that moves cold memories (below a configurable hotness threshold) to an archive directory, reducing token consumption from stale abstracts and overviews during retrieval. - scan() queries vector index for L2 memories and computes hotness scores - archive() moves cold memories to {parent}/_archive/ via viking_fs.mv() - restore() recovers archived memories to their original location - Respects min_age_days to avoid archiving recent memories - Skips L0/L1 files (abstracts and overviews are never archived) - Includes dry-run mode and scan_and_archive convenience method - 30 unit tests covering scan, archive, restore, edge cases This contribution was developed with AI assistance (Claude Code).

codeCraft-Ritik

Great work! The Python implementation is clean and easy to understand.

qin-ctx

Thanks for the well-structured PR and thorough test coverage. Left one design suggestion on scan filtering efficiency.

qin-ctx · 2026-03-17T12:23:44Z

openviking/session/memory_archiver.py

+            now = datetime.now(timezone.utc)
+
+        candidates: List[ArchivalCandidate] = []
+


[Design] (non-blocking) The server-side filter here only uses Eq("level", 2), then all other filtering (scope prefix, _archive exclusion, min age) is done client-side in Python. For large memory stores this pulls far more records than necessary.

expr.py already provides filters that can push these checks to the server:

from openviking.storage.expr import And, Eq, PathScope filter_expr = And(conds=[ Eq("level", 2), PathScope(prefix=scope_uri), # scope filtering # Could also add a TimeRange on updated_at for min_age_days, # and exclude _archive paths if the backend supports it. ])

Pushing scope, time range, and archive-exclusion filters to the server would reduce data transfer and client-side iteration significantly.

Good catch on the filter push-down. The PathScope and Eq filters would reduce the data transferred significantly for large stores. I'll update to push scope prefix and _archive exclusion to the server side.

chenjw · 2026-03-17T12:39:38Z

Does the archiving operation need to take into account possible links between file contents, and will it have any impact on future linking mechanisms? @qin-ctx

mvanhorn · 2026-03-17T16:13:37Z

Good question. The current implementation archives memories independently based on hotness scoring - it doesn't trace cross-references between memories. If a future linking mechanism is added (e.g., memory-to-memory references), archived memories would need a resolution step during retrieval to check cold storage for linked items.

For now, the archival is reversible - is_archived is a metadata flag, not a deletion. Retrieval can be extended to include archived memories when following links.

chenjw · 2026-03-17T16:33:27Z

Thanks for your contribution! The archiving idea is great. Looking forward to seeing further refinements on this design.

github-project-automation bot added this to OpenViking project Mar 15, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 15, 2026

codeCraft-Ritik reviewed Mar 15, 2026

View reviewed changes

qin-ctx self-assigned this Mar 17, 2026

qin-ctx reviewed Mar 17, 2026

View reviewed changes

chenjw approved these changes Mar 17, 2026

View reviewed changes

chenjw merged commit 5225ef4 into volcengine:main Mar 17, 2026
6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 17, 2026

mvanhorn mentioned this pull request Mar 18, 2026

fix(mcp): add api_key support and configurable defaults to MCP query server #691

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session): add memory cold-storage archival via hotness scoring#620

feat(session): add memory cold-storage archival via hotness scoring#620
chenjw merged 1 commit intovolcengine:mainfrom
mvanhorn:osc/feat-memory-cold-storage-archival

mvanhorn commented Mar 15, 2026

Uh oh!

codeCraft-Ritik left a comment

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Mar 17, 2026

Uh oh!

mvanhorn Mar 17, 2026

Uh oh!

chenjw commented Mar 17, 2026

Uh oh!

mvanhorn commented Mar 17, 2026

Uh oh!

Uh oh!

chenjw commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		now = datetime.now(timezone.utc)

		candidates: List[ArchivalCandidate] = []

Conversation

mvanhorn commented Mar 15, 2026

Summary

Problem Statement

Evidence

Proposed Solution

Changes

Testing

Future Work

Uh oh!

codeCraft-Ritik left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

chenjw commented Mar 17, 2026

Uh oh!

mvanhorn commented Mar 17, 2026

Uh oh!

Uh oh!

chenjw commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants