feat(memory): add persistent agent memory system (Phase 1)#179
feat(memory): add persistent agent memory system (Phase 1)#179fanhongy wants to merge 15 commits into
Conversation
- File-based wiki storage under ~/.aws/cli-agent-orchestrator/memory/ - 4 scopes: global, project (cwd-hashed), session, agent - MCP tools: memory_store, memory_recall, memory_forget - CLI commands: cao memory list/show/delete/clear - Auto-inject <cao-memory> context block on first message - Auto-save hooks: Claude Code (Stop + PreCompact), Kiro CLI (agentSpawn + userPromptSubmit) - Tiered retention with background cleanup - fcntl.flock() for concurrent-write safety - 1296 tests passing
| @@ -0,0 +1,197 @@ | |||
| # Memory System | |||
|
|
|||
There was a problem hiding this comment.
can we make this an opt-in feature? such as add a enableMemory config key in ~/.aws/cli-agent-orchestrator/settings.json.
There was a problem hiding this comment.
Yes, this would shiped in phase 2/3
| ### Claude Code | ||
|
|
||
| Registered in `.claude/settings.local.json` (in the working directory) at terminal creation: | ||
|
|
There was a problem hiding this comment.
I have some pushback concern regarding this approach. Doing this would result in any provider-specific config to contain hook configuration for CAO memory usage even when outside of CAO orchestration flows.
At least for claude code, I would recommend
- Creating a Claude plugin and load it with
claude --plugin-dirThis CLI flag will load all the plugin related hooks & references scoped to the session - Perhaps building on top of the hook system introduced in feat: Build support for external plugins #172 to inject context hints that can substitute stop hooks.
PreCompactdoes not seem feasible though, so option 1 may be preferred.
| ```markdown | ||
| ## Memory | ||
|
|
||
| When you discover something worth remembering — user preferences, project conventions, |
There was a problem hiding this comment.
this doesn't seem to align with the diffs made to the agent_store profiles
| | Scope | Storage location | Use when | | ||
| |---|---|---| | ||
| | `global` | `memory/global/wiki/global/` | Cross-project facts: user preferences, coding standards | | ||
| | `project` | `memory/{cwd_hash}/wiki/project/` | Project-specific: architecture decisions, conventions | |
There was a problem hiding this comment.
I think cwd hash has one major problem, which is that any change made to the filesystem (i.e. folder renaming, mv project to a different directory) would immediately break project memory reference
There was a problem hiding this comment.
this also makes any type of workflow involving worktrees incompatible
There was a problem hiding this comment.
good point.
This should be indexed in to SQL in phase 2
|
Feedback on memory usage guidance for LLMs — the current agent-profile block and MCP docstrings will likely produce store-happy but low-signal behavior. Some gaps worth closing:
Net effect: the "ALWAYS store" rule is forceful enough that agents will store things, but scope/type selection will be near-random and recall will be underused mid-task. Consider expanding the profile Memory section with (a) a scope/type decision rubric with examples, (b) an explicit exclusion list, and (c) explicit save/recall triggers. |
| **Kiro CLI:** Memory injection happens via the `agentSpawn` hook registered at `~/.kiro/agents/{profile}.json`, which fires before the first user message. CAO does not double-inject for Kiro. | ||
|
|
||
| ## Auto-Save Hooks | ||
|
|
There was a problem hiding this comment.
I would gently recommend de-scoping or deferring auto-save hook to future phases since
A. there isn't a clear path to provide this for all supported providers today
B. stale config point in the comment i made below
| - **Stop hook**: fires every 15 human messages, prompts agent to save key findings | ||
| - **PreCompact hook**: fires before context compression, prompts emergency save | ||
|
|
||
| ### Kiro CLI |
There was a problem hiding this comment.
missing a section for codex
| # Step 3d: Register provider-specific hooks for memory self-save | ||
| try: | ||
| from cli_agent_orchestrator.hooks.registration import ( | ||
| register_hooks_claude_code, | ||
| register_hooks_codex, | ||
| register_hooks_kiro, | ||
| ) | ||
|
|
||
| if provider == ProviderType.CLAUDE_CODE.value and working_directory: | ||
| register_hooks_claude_code(working_directory) | ||
| elif provider == ProviderType.CODEX.value and working_directory: | ||
| register_hooks_codex(working_directory) | ||
| elif provider == ProviderType.KIRO_CLI.value: | ||
| register_hooks_kiro(agent_profile) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to register hooks for terminal {terminal_id}: {e}") |
There was a problem hiding this comment.
Suggestion: move provider-specific hook registration into providers/ — the current hooks/registration.py is feature-envy on the provider abstraction.
Why
hooks/registration.py encodes intimate knowledge of three different config file formats (Claude Code's nested settings.local.json, Codex's flat .codex/hooks.json, Kiro's ~/.kiro/agents/{name}.json with event keys), and the service layer dispatches on provider type to pick the right function:
# services/terminal_service.py:182-197 (current)
from cli_agent_orchestrator.hooks.registration import (
register_hooks_claude_code,
register_hooks_codex,
register_hooks_kiro,
)
if provider == ProviderType.CLAUDE_CODE.value and working_directory:
register_hooks_claude_code(working_directory)
elif provider == ProviderType.CODEX.value and working_directory:
register_hooks_codex(working_directory)
elif provider == ProviderType.KIRO_CLI.value:
register_hooks_kiro(agent_profile)That provider == ... ladder is exactly what the provider interface is supposed to eliminate. Every new provider requires touching hooks/registration.py and terminal_service.py.
Proposed shape
1. Add a method to BaseProvider (providers/base.py) with a default no-op:
# providers/base.py
class BaseProvider(ABC):
...
def register_memory_hooks(
self,
working_directory: Optional[str],
agent_profile: str,
) -> None:
"""Register CAO memory self-save hooks in this provider's config files.
Default: no-op for providers that don't support hooks.
Overridden by providers that do (Claude Code, Codex, Kiro).
"""
return None2. Each provider overrides register_memory_hooks with its own config-file logic — Claude Code writes its nested settings.local.json entries, Codex writes the flat .codex/hooks.json, Kiro writes agentSpawn/userPromptSubmit into ~/.kiro/agents/{profile}.json. Providers that don't support hooks (QCliProvider, etc.) inherit the no-op default — no code needed.
3. The call site collapses to one line:
# services/terminal_service.py (proposed)
try:
provider_instance.register_memory_hooks(working_directory, agent_profile)
except Exception as e:
logger.warning(f"Failed to register hooks for terminal {terminal_id}: {e}")Tradeoff worth naming
Dispersing registration across providers loses the single-file audit surface for "what hooks does CAO install?". Mitigate by keeping path constants centralized in hooks/paths.py and adding a short table in docs/memory.md.
Net
Three concrete wins:
- Adding a new provider stops requiring edits to
hooks/registration.pyand theterminal_servicedispatch. _SAFE_PROFILE_REand the agents-dir normalization inregister_hooks_kiromove next to the rest of Kiro's agent-config handling, where they're easier to keep consistent.- Provider-level unit tests already exist per provider — hook registration fits naturally there rather than as a cross-cutting test file.
|
@fanhongy thanks for the great work, it delivered majority of the features as discussed. Here are some feedbacks: PR #179 — Phase 1 Scoped ReviewTL;DR
Scope: This review compares PR #179 against only the Phase 1 (MVP) section of Method: item-by-item walk of each Phase 1 deliverable against the shipped code. Phase 1 deliverables§Phase 1 — Memory creation
§Phase 1 — Hook-triggered self-save
§Phase 1 — Context delivery
§Phase 1 — Storage setup
§Phase 1 Success Criteria — verdict per item
Summary scorecardCounting only the 20 main deliverables (treating #3a as a sub-note on #3):
Sub-note #3a ( Phase 1 comments — by severityMust fix for Phase 1 mergeC1 — Current state — shipped without doc update — means the design doc lies to anyone reading it. C2 — REST CRUD endpoints missing. Fix options (pick one):
Either works — the PR and doc just need to agree. C3 — SQLite metadata layer absent. One-line summary: The design describes a hybrid wiki + SQLite architecture in four separate places; the code ships file-only. Design citations (all in Why this still needs action:
Fix options (pick one):
The doc and the code should agree. Right now they don't. Should fix before mergeC4 — Per-scope injection cap (§Rec 6) not enforced. Fix: In C5 — Universal first-message injection vs per-provider table in §6a. This is primarily a design-doc ambiguity, not a code failure. L543-544 and §6a L765-772 prescribe per-provider injection; L774 then licenses "Phase 1 simplification: Use instruction file/system prompt injection for all providers". The PR follows L774. Action is on the doc owner to pick one reading, then update Phase 1 Success Criterion S3 (L591) to match. Shipped behavior: If the doc settles on per-provider: push the injection logic behind a Nice to fix — low risk items worth notingC6 — C7 — Index regex drift risk (tied to C3). def test_parse_index_reads_what_update_index_writes():
svc = MemoryService(base_dir=tmp_path)
await svc.store(content="x", key="test-key", memory_type="project", scope="project", terminal_context={...})
entries = svc._parse_index(svc.get_index_path("project", scope_id))
assert entries[0]["key"] == "test-key"
# Also parse with cleanup regex and assert it matches the same lineOr, if C3 is resolved by adding SQLite, this goes away entirely. C8 — Hook flag file cleanup is time-based, not PID-based. C9 —
C10 — Testing — Phase 1 gapsPhase 1 Success Criteria S1-S6 should each have at least one automated test. Coverage review:
Recommended test additions (low effort, high signal):
Net — Phase 113 / 20 deliverables ✅, 5 / 20 Merge-blocking: two items.
Strongly recommended before merge: Post-merge acceptable (Phase 2 rollover):
The Phase 1 implementation is solid work — security posture is good, retention is faithful, hook registration is idempotent and self-healing, and the wiki+index pattern matches §Rec 1 exactly. The gaps are specific and fixable. Fix C1 and C3, reconcile the doc, and Phase 1 is mergeable. |
- Remove "Persistent agent memory system" bullet — PR #179 is still open and was not part of this release - Add missing #106 (stale spinners blocking Claude Code inbox delivery) - Add missing #135 to cryptography version-bump coverage (46.0.5 → 46.0.7) - Correct #137 and #157 references to "resolves" — they are issues, not PRs - Fix install hint in #169 bullet to use real install path (uv tool install git+https://... rather than PyPI, which CAO isn't on) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…efs (#197) - Remove "Persistent agent memory system" bullet — PR #179 is still open and was not part of this release - Add missing #106 (stale spinners blocking Claude Code inbox delivery) - Add missing #135 to cryptography version-bump coverage (46.0.5 → 46.0.7) - Correct #137 and #157 references to "resolves" — they are issues, not PRs - Fix install hint in #169 bullet to use real install path (uv tool install git+https://... rather than PyPI, which CAO isn't on) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Injects the CAO memory context into <cwd>/.claude/CLAUDE.md on post_create_terminal. Path-validates the target, replaces any prior block, logs-not-raises on every error path. 7 tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Writes <cwd>/.kiro/steering/cao-memory.md on post_create_terminal for kiro_cli. Coexists with U7's agent-identity.md (separate owners). Rename CC plugin test file to avoid rootdir collision. 8 tests for Kiro plugin. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Accepted regression matching CC + Kiro: periodic save-memory reminder dropped. Q/Gemini/Kimi/Copilot/OpenCode never had hooks, so Codex joins them on BaseProvider's no-op default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move CC/Kiro memory plugins to src/.../plugins/builtin/ with cao.plugins entry points. Delete hooks/ dir and register_hooks overrides; BaseProvider no-op stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Relabel memory_forget as Phase 1 MCP tool in MEMORY_SYSTEM_DESIGN.md (closes haofeif C1). Add Codex + Other providers + Plugin architecture subsections to docs/memory.md (closes patricka3125 Codex gap). Split agent-profile guidance into Minimal vs Aggressive (closes patricka3125 L190). Ignore local MEMORY_TESTING.md runbook. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The work evolved through Phase 2 (cao-memory-feature), Phase 2.5 (feat/memory-implementation-phase-1), and a plugin-migration pass. Summary of what landed. Storage + correctness
Architecture
Scope cleanup
Documentation
Quality gates
Known deferrals (Phase 3) Non-blocking items rolled forward: 9 INFO-severity findings from security review + the 3 REST write endpoints. Full list in aidlc-docs/phase2.5/phase3-backlog.md. Happy to walk any specific change in more detail. |
AIDLC working docs are internal team artefacts, not project deliverables. Remove from index and add to .gitignore to keep them out of PR diffs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Non-git projects fall back to sha256(realpath(cwd)) for identity, which breaks on folder rename or mv. Add an opt-in .cao/project_id marker (JSON: project_id + nonce + created_at) so identity travels with the folder. Nonce + recorded realpath in SQLite enable rename vs copy detection: if recorded path is gone treat as rename, else mint a fresh id. Marker logic only runs in the cwd-hash fallback branch — override and git-remote paths untouched. Opt out with memory.project_marker=false or CAO_PROJECT_MARKER=0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- test_send_input_success and the parametrized post_send_message test mocked terminal metadata without a "provider" key, but the first- message path in send_input reads metadata["provider"] for log_session_event. Add provider to both mock dicts. - BM25 perf budget of 500ms was too tight; bump to 1s. The 100-file corpus still catches pathological regressions with 2x headroom. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Converting this PR to Draft to split it into multiple for review purpose |
Phase 1 ships memory primitives (store/recall/forget) only. Hook-driven auto-save is delivered via per-provider plugins in a subsequent PR (per @patricka3125's recommendation in #179 to use Claude Code's plugin system rather than in-line settings.local.json edits). Removes: - src/cli_agent_orchestrator/hooks/registration.py - src/cli_agent_orchestrator/hooks/*.sh - terminal_service hook registration call site - api/main.py install_hooks() call - cleanup_service stale-hook-flag sweep - docs/memory.md auto-save section Side effect: 14 CodeQL findings on hooks/registration.py vaporise with the file. Resolves the path-injection warnings flagged on PR #179. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…245) * feat(memory): add persistent agent memory system (Phase 1) - File-based wiki storage under ~/.aws/cli-agent-orchestrator/memory/ - 4 scopes: global, project (cwd-hashed), session, agent - MCP tools: memory_store, memory_recall, memory_forget - CLI commands: cao memory list/show/delete/clear - Auto-inject <cao-memory> context block on first message - Auto-save hooks: Claude Code (Stop + PreCompact), Kiro CLI (agentSpawn + userPromptSubmit) - Tiered retention with background cleanup - fcntl.flock() for concurrent-write safety - 1296 tests passing * style: apply black formatter CI black --check flagged 5 files. Run black to align with the formatter version pinned in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(memory): remove hooks/registration.py from Phase 1 Phase 1 ships memory primitives (store/recall/forget) only. Hook-driven auto-save is delivered via per-provider plugins in a subsequent PR (per @patricka3125's recommendation in #179 to use Claude Code's plugin system rather than in-line settings.local.json edits). Removes: - src/cli_agent_orchestrator/hooks/registration.py - src/cli_agent_orchestrator/hooks/*.sh - terminal_service hook registration call site - api/main.py install_hooks() call - cleanup_service stale-hook-flag sweep - docs/memory.md auto-save section Side effect: 14 CodeQL findings on hooks/registration.py vaporise with the file. Resolves the path-injection warnings flagged on PR #179. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(memory): address P0/P1/P2/P3 findings from review P0 — silent breakage: - _get_terminal_context() now resolves cwd via the existing tmux working-directory helper, so project-scope storage works in production rather than only in tests that monkeypatch the helper. - send_input() injects memory context for every provider, including Kiro (the default). The original Phase 1 commit skipped Kiro expecting the AgentSpawn hook to handle it; that hook lands in a later plugin-migration PR, so inline injection is the correct Phase 1 path for now. P1 — correctness gaps: - Session and agent scopes nest scope_id into wiki paths, so two sessions or two agent profiles can no longer collide on the same key. Index entries follow the same nested shape. - Retention is keyed on scope (global/agent never expire; project 90d; session 14d) rather than memory_type, with user/feedback as a scope-independent override. Matches docs/memory.md. - store(scope="project") raises when no working directory is resolvable, instead of silently mixing project memories into the global container. P2 — cheap correctness: - tags: regex no longer truncates hyphens, so tags like ci-cd parse intact. - forget() passes the current timestamp into the index header so removals don't blank the <!-- Updated: --> line. - updated_at serialises via strftime("%Y-%m-%dT%H:%M:%SZ"), avoiding the invalid "+00:00Z" form for tz-aware datetimes. - store() returns the action ("created" or "updated") on the Memory model so the MCP layer no longer infers it from second-granularity timestamps. - CLI key validation enforces the same 60-char ceiling as MemoryService._sanitize_key. - cleanup_expired_memories() offloads sync glob/parse work to asyncio.to_thread so it doesn't block the FastAPI event loop on startup. P3 — process: - Added a production-path test for _get_terminal_context() that doesn't monkeypatch the function itself, so a P0-A regression surfaces immediately. - docs/memory.md now matches the scope-keyed retention semantics. 67 tests pass; black and isort clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(memory): address Copilot follow-up comments on 5af8217 Six findings from the second Copilot pass: rewrite wiki header on upsert so recall() sees fresh type/tags; normalize tag whitespace so the index regex still matches; offload forget() in cleanup via asyncio.to_thread; add success:True to memory_recall envelope; fix misleading container comment and storage-layout doc diagram. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Potential fix for pull request finding fix as suggested Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(memory): address Copilot review findings * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(memory): isolate recall by session/agent scope_id --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: haofeif <56006724+haofeif@users.noreply.github.com>
Summary
Adds a persistent, cross-session memory system to CAO. Agents can store facts, decisions, and preferences during a session; CAO automatically injects relevant memories as context when the agent's next session starts.
memory_store,memory_recall,memory_forgetexposed viacao-mcp-serverglobal(cross-project),project(cwd-hashed),session,agentuser,feedback,project,reference(classification label, no effect on storage)<cao-memory>block prepended to first user message on session startagentSpawn+userPromptSubmitcao memory list/show/delete/clear~/.aws/cli-agent-orchestrator/memory/; no SQLitefcntl.flock()on index writesdocs/memory.mdfull referenceChanged files
models/memory.pyservices/memory_service.pymcp_server/server.pyapi/main.py(+/terminals/{id}/memory-contextendpoint)cli/commands/memory.py,cli/main.pyhooks/(4 scripts +registration.py)services/cleanup_service.pyservices/terminal_service.py(injection + hook registration)agent_store/{code_supervisor,developer,reviewer}.mdconstants.pydocs/memory.md,README.md,CHANGELOG.mdtest/services/test_memory_service.py,test/providers/test_memory_injection.py,test/cli/commands/test_memory.pyTesting