feat(memory): persistent agent memory system (Phase 1) — foundation#245
Conversation
- File-based wiki storage under ~/.aws/cli-agent-orchestrator/memory/ - 4 scopes: global, project (cwd-hashed), session, agent - MCP tools: memory_store, memory_recall, memory_forget - CLI commands: cao memory list/show/delete/clear - Auto-inject <cao-memory> context block on first message - Auto-save hooks: Claude Code (Stop + PreCompact), Kiro CLI (agentSpawn + userPromptSubmit) - Tiered retention with background cleanup - fcntl.flock() for concurrent-write safety - 1296 tests passing
CI black --check flagged 5 files. Run black to align with the formatter version pinned in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 ships memory primitives (store/recall/forget) only. Hook-driven auto-save is delivered via per-provider plugins in a subsequent PR (per @patricka3125's recommendation in #179 to use Claude Code's plugin system rather than in-line settings.local.json edits). Removes: - src/cli_agent_orchestrator/hooks/registration.py - src/cli_agent_orchestrator/hooks/*.sh - terminal_service hook registration call site - api/main.py install_hooks() call - cleanup_service stale-hook-flag sweep - docs/memory.md auto-save section Side effect: 14 CodeQL findings on hooks/registration.py vaporise with the file. Resolves the path-injection warnings flagged on PR #179. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Introduces Phase 1 of a persistent, file-backed “memory” subsystem for CLI Agent Orchestrator, including a new MemoryService, MCP tools (memory_store/recall/forget), CLI commands (cao memory ...), and first-message context injection (<cao-memory>), plus retention cleanup.
Changes:
- Added file-based wiki storage + indexing for scoped memories (
global/project/session/agent) viaMemoryService. - Exposed memory operations through MCP tools and added CLI commands to inspect/manage stored memories.
- Added background cleanup for retention and extensive unit/integration test coverage.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
src/cli_agent_orchestrator/services/memory_service.py |
Core file-based storage/recall/forget + context block generation. |
src/cli_agent_orchestrator/services/terminal_service.py |
Injects <cao-memory> into first message (provider-dependent) and cleans up injection state on terminal deletion. |
src/cli_agent_orchestrator/services/cleanup_service.py |
Adds memory retention cleanup pass. |
src/cli_agent_orchestrator/models/memory.py |
Introduces Memory, MemoryScope, MemoryType model/enums. |
src/cli_agent_orchestrator/mcp_server/server.py |
Adds MCP tools: memory_store, memory_recall, memory_forget. |
src/cli_agent_orchestrator/cli/commands/memory.py |
Adds cao memory list/show/delete/clear commands. |
src/cli_agent_orchestrator/cli/main.py |
Registers the new memory CLI command group. |
src/cli_agent_orchestrator/api/main.py |
Runs memory cleanup on startup; adds /terminals/{id}/memory-context endpoint. |
src/cli_agent_orchestrator/constants.py |
Adds MEMORY_BASE_DIR constant under CAO home directory. |
docs/memory.md |
Documents scopes, types, tools, injection behavior, storage layout, retention. |
README.md |
Adds memory system feature to top-level project feature list. |
CHANGELOG.md |
Adds Unreleased entry describing Phase 1 memory functionality. |
src/cli_agent_orchestrator/agent_store/developer.md |
Adds guidance to use memory tools. |
src/cli_agent_orchestrator/agent_store/reviewer.md |
Adds guidance to use memory tools. |
src/cli_agent_orchestrator/agent_store/code_supervisor.md |
Adds guidance to use memory tools. |
test/services/test_memory_service.py |
Extensive store/recall/forget/cleanup/security tests for MemoryService. |
test/providers/test_memory_injection.py |
Tests first-message injection behavior + “no config file mutation” guarantees. |
test/cli/commands/test_memory.py |
Tests CLI command behaviors with mocked MemoryService. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR Review — Memory System Phase 1Combined findings — Copilot + manual review (verified)All findings re-verified against PR head ( P0 — block merge; ships the feature broken
Why P0:
P1 — block merge; user-visible correctness bugs
Why P1:
P2 — fix in this PR; cheap correctness
Verification notes:
P3 — defer; nits and process
(Earlier draft had a separate "file roundtrip in Tally
Merge gate summary
Copilot's review accuracy on this PR was 100% — every cited line was verified against actual code at PR head Process notes (not findings)
Suggested fix order
Net ask: ~10–15 lines of substantive code change to clear P0+P1, ~30–40 lines to also clear P2, plus one doc update. Reasonable in a single revision. |
P0 — silent breakage:
- _get_terminal_context() now resolves cwd via the existing tmux
working-directory helper, so project-scope storage works in
production rather than only in tests that monkeypatch the helper.
- send_input() injects memory context for every provider, including
Kiro (the default). The original Phase 1 commit skipped Kiro
expecting the AgentSpawn hook to handle it; that hook lands in a
later plugin-migration PR, so inline injection is the correct
Phase 1 path for now.
P1 — correctness gaps:
- Session and agent scopes nest scope_id into wiki paths, so two
sessions or two agent profiles can no longer collide on the same
key. Index entries follow the same nested shape.
- Retention is keyed on scope (global/agent never expire; project 90d;
session 14d) rather than memory_type, with user/feedback as a
scope-independent override. Matches docs/memory.md.
- store(scope="project") raises when no working directory is
resolvable, instead of silently mixing project memories into the
global container.
P2 — cheap correctness:
- tags: regex no longer truncates hyphens, so tags like ci-cd parse
intact.
- forget() passes the current timestamp into the index header so
removals don't blank the <!-- Updated: --> line.
- updated_at serialises via strftime("%Y-%m-%dT%H:%M:%SZ"), avoiding
the invalid "+00:00Z" form for tz-aware datetimes.
- store() returns the action ("created" or "updated") on the Memory
model so the MCP layer no longer infers it from second-granularity
timestamps.
- CLI key validation enforces the same 60-char ceiling as
MemoryService._sanitize_key.
- cleanup_expired_memories() offloads sync glob/parse work to
asyncio.to_thread so it doesn't block the FastAPI event loop on
startup.
P3 — process:
- Added a production-path test for _get_terminal_context() that
doesn't monkeypatch the function itself, so a P0-A regression
surfaces immediately.
- docs/memory.md now matches the scope-keyed retention semantics.
67 tests pass; black and isort clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for the input @haofeif. All 12 findings addressed in 5af8217 except P3-A (stale CodeQL bot comment, dismissed in UI). Per-row resolution:
Test plan: 67 tests pass locally (65 + 2 new for P3-B). Implementation notes worth flagging:
Ready for re-review. |
Six findings from the second Copilot pass: rewrite wiki header on upsert so recall() sees fresh type/tags; normalize tag whitespace so the index regex still matches; offload forget() in cleanup via asyncio.to_thread; add success:True to memory_recall envelope; fix misleading container comment and storage-layout doc diagram. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix as suggested Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| async def store_one(i: int): | ||
| return await svc.store( | ||
| content=f"Concurrent memory entry {i}", | ||
| scope="global", | ||
| memory_type="project", | ||
| key=f"concurrent-{i}", | ||
| terminal_context=ctx, | ||
| ) | ||
|
|
||
| async def run_concurrent(): | ||
| return await asyncio.gather(*[store_one(i) for i in range(5)]) | ||
|
|
||
| results = asyncio.run(run_concurrent()) |
| response = requests.get(f"{API_BASE_URL}/terminals/{terminal_id}") | ||
| response.raise_for_status() | ||
| meta = response.json() | ||
| ctx: Dict[str, Any] = { | ||
| "terminal_id": meta["id"], | ||
| "session_name": meta["session_name"], | ||
| "provider": meta["provider"], | ||
| "agent_profile": meta.get("agent_profile"), | ||
| } | ||
| # Try to get working directory for project scope resolution | ||
| try: | ||
| wd_resp = requests.get(f"{API_BASE_URL}/terminals/{terminal_id}/working-directory") | ||
| if wd_resp.status_code == 200: | ||
| ctx["cwd"] = wd_resp.json().get("working_directory") | ||
| except Exception: |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
One last High issue fixed in last commit, should be good to merge |
| # Track terminals that have already received memory injection (first message only). | ||
| _memory_injected_terminals: set = set() | ||
|
|
||
|
|
||
| def inject_memory_context(first_message: str, terminal_id: str) -> str: | ||
| """Prepend <cao-memory> context block to the first user message. | ||
|
|
||
| Tracks which terminals have already been injected so that only the very | ||
| first user message after init receives the memory block. | ||
|
|
||
| Calls MemoryService.get_memory_context_for_terminal() which returns | ||
| a formatted <cao-memory>...</cao-memory> block (or empty string if | ||
| no memories exist). Stateless — no file mutation, no backup/restore. | ||
| """ | ||
| if terminal_id in _memory_injected_terminals: | ||
| return first_message | ||
|
|
||
| _memory_injected_terminals.add(terminal_id) | ||
|
|
| @app.get("/terminals/{terminal_id}/memory-context") | ||
| async def get_terminal_memory_context(terminal_id: TerminalId): | ||
| """Return the CAO memory context block for a terminal as plain text. | ||
|
|
||
| Used by the Kiro AgentSpawn hook to inject memory into agent context. | ||
| Returns empty 200 if no memories exist for this terminal. | ||
| """ | ||
| from fastapi.responses import PlainTextResponse | ||
|
|
||
| try: | ||
| from cli_agent_orchestrator.services.memory_service import MemoryService | ||
|
|
||
| svc = MemoryService() | ||
| context = svc.get_memory_context_for_terminal(terminal_id) | ||
| return PlainTextResponse(content=context) | ||
| except ValueError as e: |
| async def store( | ||
| self, | ||
| content: str, | ||
| scope: str = "project", | ||
| memory_type: str = "project", | ||
| key: Optional[str] = None, | ||
| tags: str = "", | ||
| terminal_context: Optional[dict] = None, | ||
| ) -> Memory: | ||
| """Store or update a memory. Upserts wiki file + index.md. | ||
|
|
||
| Declared async for compatibility with async callers (MCP server, FastAPI). | ||
| File I/O is synchronous; a future improvement would use aiofiles. | ||
| """ |
| # Sort by updated_at descending | ||
| results.sort(key=lambda m: m.updated_at, reverse=True) | ||
|
|
||
| # Apply scope precedence ordering when no scope filter | ||
| if not scope: | ||
| precedence = { | ||
| MemoryScope.SESSION.value: 0, | ||
| MemoryScope.PROJECT.value: 1, | ||
| MemoryScope.GLOBAL.value: 2, | ||
| MemoryScope.AGENT.value: 3, | ||
| } | ||
| results.sort(key=lambda m: (precedence.get(m.scope, 99), -m.updated_at.timestamp())) | ||
|
|
| def _get_terminal_context_from_env() -> Optional[Dict[str, Any]]: | ||
| """Build terminal context dict from the calling terminal's CAO_TERMINAL_ID.""" | ||
| terminal_id = os.environ.get("CAO_TERMINAL_ID") | ||
| if not terminal_id: | ||
| return None | ||
|
|
||
| try: | ||
| response = requests.get(f"{API_BASE_URL}/terminals/{terminal_id}") | ||
| response.raise_for_status() | ||
| meta = response.json() | ||
| ctx: Dict[str, Any] = { | ||
| "terminal_id": meta["id"], | ||
| "session_name": meta["session_name"], | ||
| "provider": meta["provider"], | ||
| "agent_profile": meta.get("agent_profile"), | ||
| } | ||
| # Try to get working directory for project scope resolution | ||
| try: | ||
| wd_resp = requests.get(f"{API_BASE_URL}/terminals/{terminal_id}/working-directory") | ||
| if wd_resp.status_code == 200: | ||
| ctx["cwd"] = wd_resp.json().get("working_directory") | ||
| except Exception: | ||
| pass | ||
| return ctx |
First of 8 PRs splitting #179 (which exceeded GitHub's reviewable size — Copilot literally returned "exceeds the maximum number of lines (20,000)") and reviewer bandwidth. This PR ships Phase 1 only — file-based wiki storage, MCP store/recall/forget, CLI commands, and tiered retention.
Phases 2 onwards (SQLite metadata, BM25 search, hardening, plugin-based auto-save, LLM compilation, lint/heal, import/export, federation) follow in subsequent PRs.
What's in this PR
What's explicitly NOT in this PR
Resolves from #179
Test plan
Note for dogfood
Until the plugin migration PR lands, agents will not auto-save. They must call `memory_store` explicitly. The agent-profile guidance blocks instruct them to do so when storing facts that should outlive the session.
🤖 Generated with Claude Code