Skip to content

feat(memory): log memory retrieval failures for OmniMem self-improvement loop#3597

Merged
bug-ops merged 4 commits intomainfrom
log-memory-retrieval-failures
May 5, 2026
Merged

feat(memory): log memory retrieval failures for OmniMem self-improvement loop#3597
bug-ops merged 4 commits intomainfrom
log-memory-retrieval-failures

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented May 5, 2026

Summary

  • Adds RetrievalFailureLogger — async fire-and-forget subsystem that records memory retrieval failures (no-hit, low-confidence, timeout, error) into a new memory_retrieval_failures SQLite table (migration 083)
  • Integrates at fetch_graph_facts_raw and fetch_semantic_recall_raw in zeph-agent-context
  • Implements the minimum viable failure dataset for the OmniMem self-improvement loop (arXiv:2604.01007); scheduler-driven SYNAPSE tuning is out of scope and tracked separately

Design

  • Write path: bounded mpsc channel (256 cap) + background batch writer (16 records / 100 ms flush); try_send on hot path — zero latency impact
  • Four failure types: no_hit, low_confidence, timeout, error
  • Config ([memory.retrieval_failures]): enabled = false by default (privacy-safe opt-in), low_confidence_threshold = 0.3, retention_days = 90
  • Cleanup: automatic DELETE of rows older than retention_days every 500 flushes
  • Shutdown: tx/handle wrapped in Option<_>; shutdown() drains cleanly; Drop aborts background task

Test plan

  • 6 unit tests: no_hit_failure_is_persisted, low_confidence_failure_is_persisted, log_does_not_block_when_channel_is_full, query_text_truncated_to_512_chars, logger_disabled_when_option_is_none, multiple_records_batch_flushed
  • 8841 workspace tests pass
  • Live session: enable [memory.retrieval_failures] enabled = true and verify >100 rows after standard test session (see playbook .local/testing/playbooks/memory-retrieval-failures.md)

References

@github-actions github-actions Bot added documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes core zeph-core crate labels May 5, 2026
@bug-ops bug-ops enabled auto-merge (squash) May 5, 2026 12:21
@github-actions github-actions Bot added enhancement New feature or request size/XL Extra large PR (500+ lines) labels May 5, 2026
bug-ops added 2 commits May 5, 2026 14:30
…ent loop

Add `RetrievalFailureLogger` to `zeph-memory` — an async fire-and-forget
subsystem that records no-hit turns, low-confidence recalls, timeouts, and
errors into a new `memory_retrieval_failures` SQLite table (migration 083).

The write path uses a bounded mpsc channel (256 cap) with a background batch
writer (16 records / 100 ms flush). `try_send` on the hot path adds zero
latency to the retrieval critical path (INV-1 satisfied).

Integration points:
- `fetch_graph_facts_raw`: logs no-hit and all error paths (Bfs/AStar/
  WaterCircles/BeamSearch/Hybrid) before propagating errors (B1 fix)
- `fetch_semantic_recall_raw`: logs no-hit and low-confidence paths

Config via `[memory.retrieval_failures]`:
- `enabled` (default `false`) — privacy-safe opt-in
- `low_confidence_threshold` (default `0.3`)
- `retention_days` (default `90`) with automatic cleanup every 500 flushes

`RetrievalFailureLogger` wraps `tx` and `handle` in `Option<_>` so
`shutdown()` drains the channel cleanly and `Drop` aborts the background task
if shutdown is skipped. The `flush_batch` span uses `.instrument()` to capture
actual SQLite INSERT latency in traces.

This provides the minimum viable failure dataset required by the OmniMem
self-improvement loop (arXiv:2604.01007) for future SYNAPSE parameter tuning.

Closes #3576
@bug-ops bug-ops force-pushed the log-memory-retrieval-failures branch from 97e1e1e to 5393f2c Compare May 5, 2026 12:30
@bug-ops bug-ops merged commit 551b025 into main May 5, 2026
32 checks passed
@bug-ops bug-ops deleted the log-memory-retrieval-failures branch May 5, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request memory zeph-memory crate (SQLite) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(memory): log memory retrieval failures in skill_outcomes for OmniMem self-improvement loop

1 participant