feat(memory): add persistent agent memory system (Phase 1) by fanhongy · Pull Request #179 · awslabs/cli-agent-orchestrator

fanhongy · 2026-04-17T04:20:05Z

Summary

Adds a persistent, cross-session memory system to CAO. Agents can store facts, decisions, and preferences during a session; CAO automatically injects relevant memories as context when the agent's next session starts.

MCP tools — memory_store, memory_recall, memory_forget exposed via cao-mcp-server
4 scopes — global (cross-project), project (cwd-hashed), session, agent
4 types — user, feedback, project, reference (classification label, no effect on storage)
Context injection — <cao-memory> block prepended to first user message on session start
Auto-save hooks — Claude Code: Stop + PreCompact; Kiro CLI: agentSpawn + userPromptSubmit
CLI commands — cao memory list/show/delete/clear
File-based storage — wiki markdown files under ~/.aws/cli-agent-orchestrator/memory/; no SQLite
Concurrent-write safety — fcntl.flock() on index writes
Tiered retention — global/agent: never expire; project: 90d; session: 14d
Docs — docs/memory.md full reference

Changed files

Area	Files
Models	`models/memory.py`
Service	`services/memory_service.py`
MCP tools	`mcp_server/server.py`
API	`api/main.py` (+`/terminals/{id}/memory-context` endpoint)
CLI	`cli/commands/memory.py`, `cli/main.py`
Hooks	`hooks/` (4 scripts + `registration.py`)
Cleanup	`services/cleanup_service.py`
Terminal	`services/terminal_service.py` (injection + hook registration)
Agent profiles	`agent_store/{code_supervisor,developer,reviewer}.md`
Config	`constants.py`
Docs	`docs/memory.md`, `README.md`, `CHANGELOG.md`
Tests	`test/services/test_memory_service.py`, `test/providers/test_memory_injection.py`, `test/cli/commands/test_memory.py`

Testing

# Unit tests (1296 passing, 7 pre-existing failures unrelated to this PR)
uv run pytest test/ --ignore=test/e2e --ignore=test/providers/test_q_cli_integration.py -v

# Manual test with local build
uv tool install --force --reinstall .
cao-server &
cao launch --agents developer --provider claude_code

Notes

- Phase 1 is file-based only — no SQLite, no vector search. Semantic search and backend abstraction are Phase 2.
- Kiro CLI hooks require cao-server to be running at terminal creation time to register correctly.
- async def on store/recall/forget is intentional for caller compatibility; proper aiofiles migration is a future improvement.

- File-based wiki storage under ~/.aws/cli-agent-orchestrator/memory/ - 4 scopes: global, project (cwd-hashed), session, agent - MCP tools: memory_store, memory_recall, memory_forget - CLI commands: cao memory list/show/delete/clear - Auto-inject <cao-memory> context block on first message - Auto-save hooks: Claude Code (Stop + PreCompact), Kiro CLI (agentSpawn + userPromptSubmit) - Tiered retention with background cleanup - fcntl.flock() for concurrent-write safety - 1296 tests passing

… compatibility

patricka3125 · 2026-04-17T05:51:14Z

@@ -0,0 +1,197 @@
+# Memory System
+


can we make this an opt-in feature? such as add a enableMemory config key in ~/.aws/cli-agent-orchestrator/settings.json.

Yes, this would shiped in phase 2/3

patricka3125 · 2026-04-17T06:37:59Z

+### Claude Code
+
+Registered in `.claude/settings.local.json` (in the working directory) at terminal creation:
+


I have some pushback concern regarding this approach. Doing this would result in any provider-specific config to contain hook configuration for CAO memory usage even when outside of CAO orchestration flows.

At least for claude code, I would recommend

Creating a Claude plugin and load it with claude --plugin-dir This CLI flag will load all the plugin related hooks & references scoped to the session

Perhaps building on top of the hook system introduced in feat: Build support for external plugins #172 to inject context hints that can substitute stop hooks. PreCompact does not seem feasible though, so option 1 may be preferred.

patricka3125 · 2026-04-17T06:42:10Z

+```markdown
+## Memory
+
+When you discover something worth remembering — user preferences, project conventions,


this doesn't seem to align with the diffs made to the agent_store profiles

patricka3125 · 2026-04-17T06:48:53Z

+| Scope | Storage location | Use when |
+|---|---|---|
+| `global` | `memory/global/wiki/global/` | Cross-project facts: user preferences, coding standards |
+| `project` | `memory/{cwd_hash}/wiki/project/` | Project-specific: architecture decisions, conventions |


I think cwd hash has one major problem, which is that any change made to the filesystem (i.e. folder renaming, mv project to a different directory) would immediately break project memory reference

this also makes any type of workflow involving worktrees incompatible

good point.

This should be indexed in to SQL in phase 2

patricka3125 · 2026-04-17T06:56:11Z

Feedback on memory usage guidance for LLMs — the current agent-profile block and MCP docstrings will likely produce store-happy but low-signal behavior. Some gaps worth closing:

No guidance on scope. The tool accepts global|project|session|agent but neither the docstrings nor the profile explain when to use which. Given the project default, an LLM told to persist "user preferences" will almost always land in project when most belong in global. This is the biggest likely failure mode.
No guidance on memory_type. Four types (user|feedback|project|reference) with no decision criteria; the default is project, so the type dimension will collapse regardless of content.
No "when NOT to store" list. Without explicit exclusions (code patterns derivable from the repo, git history, ephemeral task state, fix recipes), agents will accumulate redundant/decaying entries like "fixed bug in X" or "ran tests and they passed."
No key-naming convention. The auto-slug from the first 6 words will collide and hurt search. Agents need a nudge to pick stable, topical slugs.
recall triggers are too narrow. "Check before asking the user" misses the more useful trigger: before starting a task in an area, recall what we know about it. LLMs tend to skip recall entirely when a task feels self-contained.
No post-injection feedback loop. The <cao-memory> block shows what's loaded on first message, but after that the agent has no visibility into the store unless it calls recall, and nothing nudges it to.

Net effect: the "ALWAYS store" rule is forceful enough that agents will store things, but scope/type selection will be near-random and recall will be underused mid-task. Consider expanding the profile Memory section with (a) a scope/type decision rubric with examples, (b) an explicit exclusion list, and (c) explicit save/recall triggers.

patricka3125 · 2026-04-17T07:00:32Z

+**Kiro CLI:** Memory injection happens via the `agentSpawn` hook registered at `~/.kiro/agents/{profile}.json`, which fires before the first user message. CAO does not double-inject for Kiro.
+
+## Auto-Save Hooks
+


I would gently recommend de-scoping or deferring auto-save hook to future phases since
A. there isn't a clear path to provide this for all supported providers today
B. stale config point in the comment i made below

patricka3125 · 2026-04-17T07:06:25Z

+- **Stop hook**: fires every 15 human messages, prompts agent to save key findings
+- **PreCompact hook**: fires before context compression, prompts emergency save
+
+### Kiro CLI


missing a section for codex

patricka3125 · 2026-04-17T07:09:10Z

+        # Step 3d: Register provider-specific hooks for memory self-save
+        try:
+            from cli_agent_orchestrator.hooks.registration import (
+                register_hooks_claude_code,
+                register_hooks_codex,
+                register_hooks_kiro,
+            )
+
+            if provider == ProviderType.CLAUDE_CODE.value and working_directory:
+                register_hooks_claude_code(working_directory)
+            elif provider == ProviderType.CODEX.value and working_directory:
+                register_hooks_codex(working_directory)
+            elif provider == ProviderType.KIRO_CLI.value:
+                register_hooks_kiro(agent_profile)
+        except Exception as e:
+            logger.warning(f"Failed to register hooks for terminal {terminal_id}: {e}")


Suggestion: move provider-specific hook registration into providers/ — the current hooks/registration.py is feature-envy on the provider abstraction.

Why

hooks/registration.py encodes intimate knowledge of three different config file formats (Claude Code's nested settings.local.json, Codex's flat .codex/hooks.json, Kiro's ~/.kiro/agents/{name}.json with event keys), and the service layer dispatches on provider type to pick the right function:

# services/terminal_service.py:182-197 (current) from cli_agent_orchestrator.hooks.registration import ( register_hooks_claude_code, register_hooks_codex, register_hooks_kiro, ) if provider == ProviderType.CLAUDE_CODE.value and working_directory: register_hooks_claude_code(working_directory) elif provider == ProviderType.CODEX.value and working_directory: register_hooks_codex(working_directory) elif provider == ProviderType.KIRO_CLI.value: register_hooks_kiro(agent_profile)

That provider == ... ladder is exactly what the provider interface is supposed to eliminate. Every new provider requires touching hooks/registration.py and terminal_service.py.

Proposed shape

1. Add a method to BaseProvider (providers/base.py) with a default no-op:

# providers/base.py class BaseProvider(ABC): ... def register_memory_hooks( self, working_directory: Optional[str], agent_profile: str, ) -> None: """Register CAO memory self-save hooks in this provider's config files. Default: no-op for providers that don't support hooks. Overridden by providers that do (Claude Code, Codex, Kiro). """ return None

2. Each provider overrides register_memory_hooks with its own config-file logic — Claude Code writes its nested settings.local.json entries, Codex writes the flat .codex/hooks.json, Kiro writes agentSpawn/userPromptSubmit into ~/.kiro/agents/{profile}.json. Providers that don't support hooks (QCliProvider, etc.) inherit the no-op default — no code needed.

3. The call site collapses to one line:

# services/terminal_service.py (proposed) try: provider_instance.register_memory_hooks(working_directory, agent_profile) except Exception as e: logger.warning(f"Failed to register hooks for terminal {terminal_id}: {e}")

Tradeoff worth naming

Dispersing registration across providers loses the single-file audit surface for "what hooks does CAO install?". Mitigate by keeping path constants centralized in hooks/paths.py and adding a short table in docs/memory.md.

Net

Three concrete wins:

Adding a new provider stops requiring edits to hooks/registration.py and the terminal_service dispatch.

_SAFE_PROFILE_RE and the agents-dir normalization in register_hooks_kiro move next to the rest of Kiro's agent-config handling, where they're easier to keep consistent.

Provider-level unit tests already exist per provider — hook registration fits naturally there rather than as a cross-cutting test file.

haofeif · 2026-04-18T05:04:50Z

@fanhongy thanks for the great work, it delivered majority of the features as discussed. Here are some feedbacks:

PR #179 — Phase 1 Scoped Review

TL;DR

Status: Request changes. ~70% of Phase 1 spec shipped (13 ✅ / 5 ⚠️ / 2 ❌).
Two merge blockers:
1. C1 — memory_forget registered as a Phase 1 MCP tool; design explicitly labels it Phase 2. (Given the complexity of the PR, i suggest move this to next PR)
2. C3 — SQLite metadata layer spec'd in four separate places in the doc; not shipped.
Both blockers are doc-vs-code mismatches — resolution can be "fix the code" or "fix the doc," but today they disagree.

Scope: This review compares PR #179 against only the Phase 1 (MVP) section of memory-management.md (L530-550) and Phase 1 Success Criteria (L587-594). Out-of-scope items (Phase 2–4) are not graded.

Method: item-by-item walk of each Phase 1 deliverable against the shipped code.

Phase 1 deliverables

§Phase 1 — Memory creation

#	Phase 1 deliverable	Verdict	Evidence
1	`MemoryMetadataModel` database table + migration in `database.py`	❌ Not shipped	`clients/database.py` has no memory model; `memory_service.py` comment L1 states "Phase 1 — file-based, no SQLite"
2	`Memory` pydantic model in `models/memory.py`	✅ Shipped	`models/memory.py` defines `Memory`, `MemoryScope`, `MemoryType`
3	`MemoryService` (store, recall, forget, resolve_scope_id, get_memory_context_for_terminal)	✅ Shipped	`services/memory_service.py` has all 5 methods
3a	Phase 1 doc note: "`forget` is an internal method for cleanup; MCP exposure via `memory_forget` is Phase 2"	⚠️ Violated	`forget()` is ALSO registered as `memory_forget` MCP tool at `mcp_server/server.py:765-794`
4	`memory_store` MCP tool in `server.py`	✅ Shipped	`mcp_server/server.py:653-704`
5	`memory_recall` MCP tool in `server.py`	✅ Shipped	`mcp_server/server.py:707-762`
6	REST endpoints: `POST /memory`, `GET /memory`, `DELETE /memory`, `GET /terminals/{id}/memory`	⚠️ Partial	Only `GET /terminals/{id}/memory-context` exists (`api/main.py:437-458`). No CRUD endpoints.
7	Memory cleanup in `cleanup_service.py` (tiered retention per Rec #9)	✅ Shipped	`cleanup_expired_memories()` + `RETENTION_POLICY` matches §Rec 9 table exactly

§Phase 1 — Hook-triggered self-save

#	Phase 1 deliverable	Verdict	Evidence
8	Register Stop hook for Claude Code (`.claude/settings.local.json`)	✅ Shipped	`hooks/registration.py:60-122`
9	Register Stop hook for Codex (`.codex/hooks.json`)	✅ Shipped	`hooks/registration.py:192-220`
10	Register PreCompact hook for Claude Code	✅ Shipped	`hooks/registration.py:117-119`
11	Other providers: instruction-based save reminder	⚠️ Partial	Agent-profile instruction shipped (developer/reviewer/code_supervisor). Kiro additionally gets a `userPromptSubmit` hook counter — exceeds the Phase 1 spec. Gemini/Kimi/Copilot: instruction-only, which matches spec.

§Phase 1 — Context delivery

#	Phase 1 deliverable	Verdict	Evidence
12	Rule-based injection: SQLite metadata query → read matching wiki files → fill token budget	⚠️ Adapted	Fills budget correctly, but query path is `_parse_index()` regex scan (no SQLite).
13	Scope precedence session > project > global on recall with no scope filter	✅ Shipped	`memory_service.py:402-408`
14	Hard budget ~2KB per §Rec 6 (or ~2-4KB per §Phase 1 L543) total injection	⚠️ Off-spec	`get_memory_context_for_terminal(budget_chars=3000)` — 3KB sits between the §Rec 6 ceiling (~2KB) and the §Phase 1 ceiling (2-4KB). More importantly, no per-scope cap ("10 most recent per scope" per §Rec 6 L756) — a single scope can monopolize the whole budget.
15	Memory injection at terminal creation for all 6 providers (per Provider-Specific table)	⚠️ Simplified	Per `terminal_service.py:52-74,324-325`: one universal path — prepend `<cao-memory>` to first user message. Kiro alone uses `agentSpawn` hook. Spec requires per-provider mechanism (GEMINI.md append for Gemini, `-c developer_instructions` for Codex, etc. — see §6a table L765-772). Phase 1 explicitly allows "instruction file/system prompt injection for all providers" as a Phase 1 simplification (L774) — so this is within spec if you read L774 as authoritative, off-spec if you read L544 + §6a table as authoritative. The design doc is internally ambiguous.
16	Memory instruction added to built-in agent profiles	✅ Shipped	`developer.md`, `reviewer.md`, `code_supervisor.md` each have a `## Memory` section

§Phase 1 — Storage setup

#	Phase 1 deliverable	Verdict	Evidence
17	`~/.aws/cli-agent-orchestrator/memory/{project_hash}/wiki/` structure	✅ Shipped	`memory_service.py:112-123`
18	`index.md` master map — created on first `memory_store`, updated on each new topic file	✅ Shipped	`_update_index()` with `fcntl.flock`
19	Each `memory_store` writes content to topic file under `wiki/{scope}/`	✅ Shipped	`get_wiki_path()` resolves to `wiki/{scope}/{key}.md`
20	SQLite metadata (key, scope, tags, timestamps, file_path) for lookup + scope resolution	❌ Not shipped	Same gap as #1.

§Phase 1 Success Criteria — verdict per item

#	Success Criterion	Verdict	Notes
S1	Agent `memory_store` → wiki file created + SQLite upsert + `index.md` updated	⚠️ Partial	Wiki + index OK; SQLite step absent
S2	Different agent can `memory_recall` the same fact	✅	`recall()` works cross-terminal via shared wiki
S3	Injection at terminal creation for all 6 providers	⚠️ Partial	Universal first-message prepend; per-provider table in §6a not followed
S4	Hook-triggered self-save fires reliably for Claude Code + Codex	✅	Stop hooks installed; Claude also has PreCompact
S5	Memories survive server restart (wiki files + SQLite metadata)	⚠️ Partial	Wiki files persist; no SQLite. Since metadata is in `index.md`, data survives but via a different mechanism than spec'd
S6	Tiered retention cleanup removes expired session/project/reference while preserving user/feedback	✅	`RETENTION_POLICY` matches §Rec 9 table; cleanup exercises all tiers

Summary scorecard

Counting only the 20 main deliverables (treating #3a as a sub-note on #3):

Outcome	Count	Items
✅ Shipped as spec'd	13	#2, #3, #4, #5, #7, #8, #9, #10, #13, #16, #17, #18, #19
⚠️ Partial / adapted / off-spec	5	#6 (REST), #11 (hook extras), #12 (no SQLite-backed query), #14 (no per-scope cap), #15 (universal injection)
❌ Not shipped	2	#1 (SQLite table), #20 (SQLite metadata layer) — one concern surfacing in two rows

Sub-note #3a (memory_forget exposed via MCP) is a doc-vs-code mismatch on top of shipped functionality and is tracked separately under C1 below.

Phase 1 comments — by severity

Must fix for Phase 1 merge

C1 — memory_forget is explicitly Phase 2; shipped as Phase 1 MCP tool.
The design doc is unambiguous at L169 (### memory_forget (Phase 2)) and L535 ("forget is an internal method for cleanup; MCP exposure via memory_forget is Phase 2"). The PR registers it anyway at mcp_server/server.py:765-794.-- Sugest remove this to a separate PR

Current state — shipped without doc update — means the design doc lies to anyone reading it.

C2 — REST CRUD endpoints missing.
§Phase 1 L537 calls for POST /memory, GET /memory, DELETE /memory, GET /terminals/{id}/memory. Only the last exists — and even that is shipped as GET /terminals/{terminal_id}/memory-context (api/main.py:437-458), with a different URL suffix than the spec.

Fix options (pick one):

Implement the three missing endpoints — they're thin wrappers over MemoryService calls.
Update §Phase 1 L537 to say "MCP + CLI only; REST CRUD deferred to Phase 4 alongside Web UI" and explain the rationale (web UI is Phase 4, no non-UI HTTP consumer exists yet).

Either works — the PR and doc just need to agree.

C3 — SQLite metadata layer absent.

One-line summary: The design describes a hybrid wiki + SQLite architecture in four separate places; the code ships file-only.

Design citations (all in memory-management.md): L533 (Phase 1 deliverable: MemoryMetadataModel table + migration), L543 (context delivery spec'd as "SQLite metadata query"), L593 (success criterion: "wiki files + SQLite metadata persisted"), L670-671 (§Rec 1 hybrid call-out), L743-749 (§Rec 5 full "Hybrid Storage" argument). The PR description acknowledges the divergence.

Why this still needs action:

cleanup_service._find_expired_entries (L164-167) and memory_service._parse_index (L461-464) are two hand-rolled regex variants of the same format — drift risk §Rec 5 explicitly warns about.
recall() and get_memory_context_for_terminal do O(N) regex-parse + file-read per call. Fine at Phase 1 scale; the design chose SQLite specifically to not worry about this.

Fix options (pick one):

Add a MemoryMetadataModel SQLAlchemy table that mirrors the index.md bullet line. Write to both on store; read from SQLite on recall and _find_expired_entries. Keep index.md as the human-readable index.
Update memory-management.md to re-scope Phase 1 to file-only — specifically: L533 (drop the MemoryMetadataModel line), L543 (replace with "index.md metadata scan"), L593 (drop "+ SQLite metadata"), and add a note to §Rec 5 explaining why Phase 1 ships without SQLite despite the hybrid pitch.

The doc and the code should agree. Right now they don't.

Should fix before merge

C4 — Per-scope injection cap (§Rec 6) not enforced.
get_memory_context_for_terminal(budget_chars=3000) is a single global budget (L615-628). §Rec 6 L756 specifies "Hard limit of 10 most recent memories per scope, max ~2KB total injection" — note per scope. A scope with 100 memories will monopolize the budget.

Fix: In get_memory_context_for_terminal, take at most N=10 memories per scope before applying the byte budget. Small change — one [:10] slice inside the scope loop. Keeps injection balanced across session/project/global.

C5 — Universal first-message injection vs per-provider table in §6a.

This is primarily a design-doc ambiguity, not a code failure. L543-544 and §6a L765-772 prescribe per-provider injection; L774 then licenses "Phase 1 simplification: Use instruction file/system prompt injection for all providers". The PR follows L774. Action is on the doc owner to pick one reading, then update Phase 1 Success Criterion S3 (L591) to match.

Shipped behavior: terminal_service.inject_memory_context (L52-74) prepends <cao-memory> to the first user message for all providers except Kiro. Kiro alone uses the agentSpawn hook.

If the doc settles on per-provider: push the injection logic behind a BaseProvider.format_memory_injection() method — matches @patricka3125 comment #4 and sets up Phase 2 cache-aware separation cleanly.

Nice to fix — low risk items worth noting

C6 — _memory_injected_terminals is module-level in-memory state.
terminal_service.py:49 — lost on server restart. Next user message after a restart mid-session will re-inject the memory block (harmless but surprising). Could be backed by a DB column or a per-terminal flag file. Low priority for Phase 1 — worth a comment at the definition site noting the known limitation.

C7 — Index regex drift risk (tied to C3).
memory_service._parse_index regex (L461-464) and cleanup_service._find_expired_entries regex (L164-167) are two separate hand-rolled patterns for the same format. Right now they agree — but there's no invariant enforcing that. Add a single test:

def test_parse_index_reads_what_update_index_writes():
    svc = MemoryService(base_dir=tmp_path)
    await svc.store(content="x", key="test-key", memory_type="project", scope="project", terminal_context={...})
    entries = svc._parse_index(svc.get_index_path("project", scope_id))
    assert entries[0]["key"] == "test-key"
    # Also parse with cleanup regex and assert it matches the same line

Or, if C3 is resolved by adding SQLite, this goes away entirely.

C8 — Hook flag file cleanup is time-based, not PID-based.
_cleanup_stale_hook_flags removes flag files older than 5 min (cleanup_service.py:218-242). If a Claude Code session runs a long operation inside a Stop hook (rare but possible during large memory_store calls), the flag could be deleted while still in use. PID-based cleanup (kill -0 $pid check) is more precise. Low priority — time-based is fine for Phase 1 if documented.

C9 — register_hooks_kiro always rewrites the agent config.
Every cao launch triggers register_hooks_kiro which self-heals the .json by removing all existing CAO entries and re-adding them (hooks/registration.py:174-186). Two concerns:

If a user hand-edits ~/.kiro/agents/{profile}.json to tweak a CAO hook, their edit is silently erased on next launch.
Writing to a user-managed config outside the CAO-owned filesystem is what @patricka3125's comment Feature: async delegate #3 flagged. The spec doesn't explicitly prohibit it, but Kiro's AgentSpawn hook is the only way CAO can inject memory into Kiro, so this is load-bearing. Worth either documenting the behavior in docs/memory.md or adding a "managed-by: cao" marker comment in the hook entries.

C10 — cao memory forget CLI + memory_forget MCP both delegate to MemoryService.forget().
If C1 is resolved by removing the MCP tool, the CLI command (cli/commands/memory.py delete/clear) still exposes the operation to the user — which is the intent of §User-control question (L641): "Should users be able to view, edit, and delete memories via CLI before Phase 4 web UI?". So the CLI path is correct Phase 1 behavior; only the MCP surface needs adjustment.

Testing — Phase 1 gaps

Phase 1 Success Criteria S1-S6 should each have at least one automated test. Coverage review:

Criterion	Test coverage
S1 (store → wiki + index)	✅ `test_memory_service.py` covers this
S2 (cross-agent recall)	✅ Covered — any test that stores then calls `recall()` in a fresh `MemoryService()`
S3 (injection all 6 providers)	✅ for shipped behavior — `test_memory_injection.py` covers the universal prepend path + Kiro skip. Per-provider variants aren't tested because the code doesn't do them; if C5 is resolved in favor of per-provider, add coverage.
S4 (hooks fire reliably)	✅ for unit-level — `test_memory_injection.py` covers `inject_memory_context`. Shell-script end-to-end tests for Claude Code / Codex Stop hooks are an acceptable Phase 1 gap.
S5 (survive server restart)	⚠️ No test that restarts the service and recalls. Add: create `MemoryService()`, `store()`, discard, create fresh `MemoryService()`, `recall()` — passes because wiki is durable. Easy to add.
S6 (tiered retention)	✅ Cleanup tests likely exist — verify with `rg "cleanup_expired_memories" src/tests`

Recommended test additions (low effort, high signal):

Writer/reader regex round-trip test (guards against C7 drift).
Concurrent _update_index test — two threads calling store() simultaneously should produce two index entries, not one corrupted one. fcntl.flock is the safety net being tested.
Restart durability test (S5).

Net — Phase 1

13 / 20 deliverables ✅, 5 / 20 ⚠️, 2 / 20 ❌ (plus sub-note #3a — doc/code mismatch). Call it ~70% spec-fidelity for Phase 1 — same number as the broader review arrived at, but this time with each item pinned to a specific design-doc line.

Merge-blocking: two items.

C1 (memory_forget in Phase 1 MCP) — pick one of the two fixes above. Currently the code ships a tool the doc says is Phase 2, with no rationale.
C3 (SQLite metadata layer absent) — pick one of the two fixes above. Currently the design doc describes a hybrid architecture and the code ships a file-only one. Either implement the hybrid or re-scope the doc.

Strongly recommended before merge:
3. C2 (REST CRUD) — implement or re-scope.
4. C4 (per-scope injection cap) — one-line fix, trivial.
5. Reply to @patricka3125's 7 inline comments — at least items 3, 4, 6, 7 have concrete code actions; 1, 2, 5 are design discussions the PR author should weigh in on.

Post-merge acceptable (Phase 2 rollover):

C5 (per-provider injection), C6 (in-memory tracker), C7 (regex drift if SQLite adopted), C8 (PID-based flag cleanup), C9 (Kiro config ownership)

The Phase 1 implementation is solid work — security posture is good, retention is faithful, hook registration is idempotent and self-healing, and the wiki+index pattern matches §Rec 1 exactly. The gaps are specific and fixable. Fix C1 and C3, reconcile the doc, and Phase 1 is mergeable.

- Remove "Persistent agent memory system" bullet — PR #179 is still open and was not part of this release - Add missing #106 (stale spinners blocking Claude Code inbox delivery) - Add missing #135 to cryptography version-bump coverage (46.0.5 → 46.0.7) - Correct #137 and #157 references to "resolves" — they are issues, not PRs - Fix install hint in #169 bullet to use real install path (uv tool install git+https://... rather than PyPI, which CAO isn't on) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…efs (#197) - Remove "Persistent agent memory system" bullet — PR #179 is still open and was not part of this release - Add missing #106 (stale spinners blocking Claude Code inbox delivery) - Add missing #135 to cryptography version-bump coverage (46.0.5 → 46.0.7) - Correct #137 and #157 references to "resolves" — they are issues, not PRs - Fix install hint in #169 bullet to use real install path (uv tool install git+https://... rather than PyPI, which CAO isn't on) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Pulls in PR #172 plugin system and PR #193 opencode_cli so later phases can build plugins instead of provider hooks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Injects the CAO memory context into <cwd>/.claude/CLAUDE.md on post_create_terminal. Path-validates the target, replaces any prior block, logs-not-raises on every error path. 7 tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Writes <cwd>/.kiro/steering/cao-memory.md on post_create_terminal for kiro_cli. Coexists with U7's agent-identity.md (separate owners). Rename CC plugin test file to avoid rootdir collision. 8 tests for Kiro plugin. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Accepted regression matching CC + Kiro: periodic save-memory reminder dropped. Q/Gemini/Kimi/Copilot/OpenCode never had hooks, so Codex joins them on BaseProvider's no-op default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Move CC/Kiro memory plugins to src/.../plugins/builtin/ with cao.plugins entry points. Delete hooks/ dir and register_hooks overrides; BaseProvider no-op stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Relabel memory_forget as Phase 1 MCP tool in MEMORY_SYSTEM_DESIGN.md (closes haofeif C1). Add Codex + Other providers + Plugin architecture subsections to docs/memory.md (closes patricka3125 Codex gap). Split agent-profile guidance into Minimal vs Aggressive (closes patricka3125 L190). Ignore local MEMORY_TESTING.md runbook. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fanhongy · 2026-05-07T06:07:00Z

The work evolved through Phase 2 (cao-memory-feature), Phase 2.5 (feat/memory-implementation-phase-1), and a plugin-migration pass. Summary of what landed.

Storage + correctness

SQLite metadata layer — MemoryMetadataModel in clients/database.py:72 with WAL mode and UniqueConstraint(key, scope, scope_id). Wiki files remain the source of truth; SQLite is the fast-lookup + concurrency mechanism.
Per-scope injection cap — MEMORY_MAX_PER_SCOPE=10 and MEMORY_SCOPE_BUDGET_CHARS=1000 enforced in get_memory_context_for_terminal(). No single scope can monopolize the envelope.
Index round-trip invariant — writer/reader regex drift is locked by a test (test_memory_service_index_roundtrip.py); ISO-8601 Z-suffix strptime format is part of the contract.
Durability + concurrent writes — MemoryService reinstantiation recovers all prior memories via SQLite; concurrent writers to the same scope are serialized by per-connection SQLite locking + atomic os.replace(). fcntl.flock remains as defense-in-depth in the fallback path.

Architecture

Opt-in flag — memory.enabled settings key. When false, writes raise, get_memory_context_for_terminal() returns "", and no filesystem or SQLite side-effects occur. Default behavior unchanged.
Stable project identity — three-tier resolver (explicit config → git remote → cwd hash) plus an alias table. Memories survive directory rename and resolve identically across git worktrees. Dry-run migration planner (plan_project_dir_migration) classifies renames as none | rename | merge | conflict; live --apply is deferred to Phase 3.
Hook registration on BaseProvider — terminal_service.py no longer contains any if provider == ProviderType.* hook conditionals. BaseProvider.register_hooks() is a no-op default; providers that need hooks override it.
Plugin migration — memory hooks for Claude Code and Kiro CLI now ship as entry-point plugins under src/cli_agent_orchestrator/plugins/builtin/. The hooks/ directory has been retired entirely. Non-CAO Claude Code sessions are now hook-free.

Scope cleanup

REST CRUD endpoints — formally deferred to Phase 3. Surface is now 1 shipped (GET /terminals/{id}/memory-context) + 3 deferred write endpoints. Rationale and Phase 3 prerequisites in aidlc-docs/phase2.5/phase3-backlog.md.
Spec sync — design docs reconciled against shipped reality. Stale Phase-2 labels on memory_forget corrected to Phase 1 (already shipped as an MCP tool in this PR).

Documentation

docs/memory.md gained dedicated subsections for Codex, Gemini/Kimi/Copilot (all instruction-only; no native hooks), and the plugin architecture.
Agent-profile guidance is now split into Minimal (default template; matches built-in profiles) and Aggressive (test-harness template with concrete store/recall triggers plus session-start / session-end rituals). The Aggressive template is used by the -local variants under ~/.aws/cli-agent-orchestrator/agent-store/; built-ins will be upgraded once the memory feature exits dogfood.

Quality gates

All 9 Phase 2.5 units passed a two-gate review (challenger + security-reviewer) using a source-mutation teeth-test methodology. Audit trail: aidlc-docs/phase2.5/audit.md.
1,504+ tests passing, zero critical findings across both gates.
14 CodeQL path-expression alerts previously flagged on hooks/registration.py — the file is now deleted, so the alerts should auto-resolve on this push.

Known deferrals (Phase 3)

Non-blocking items rolled forward: 9 INFO-severity findings from security review + the 3 REST write endpoints. Full list in aidlc-docs/phase2.5/phase3-backlog.md.

Happy to walk any specific change in more detail.

AIDLC working docs are internal team artefacts, not project deliverables. Remove from index and add to .gitignore to keep them out of PR diffs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Non-git projects fall back to sha256(realpath(cwd)) for identity, which breaks on folder rename or mv. Add an opt-in .cao/project_id marker (JSON: project_id + nonce + created_at) so identity travels with the folder. Nonce + recorded realpath in SQLite enable rename vs copy detection: if recorded path is gone treat as rename, else mint a fresh id. Marker logic only runs in the cwd-hash fallback branch — override and git-remote paths untouched. Opt out with memory.project_marker=false or CAO_PROJECT_MARKER=0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- test_send_input_success and the parametrized post_send_message test mocked terminal metadata without a "provider" key, but the first- message path in send_input reads metadata["provider"] for log_session_event. Add provider to both mock dicts. - BM25 perf budget of 500ms was too tight; bump to 1s. The 100-file corpus still catches pathological regressions with 2x headroom. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

fanhongy · 2026-05-19T04:34:15Z

Converting this PR to Draft to split it into multiple for review purpose

@patricka3125

Phase 1 ships memory primitives (store/recall/forget) only. Hook-driven auto-save is delivered via per-provider plugins in a subsequent PR (per @patricka3125's recommendation in #179 to use Claude Code's plugin system rather than in-line settings.local.json edits). Removes: - src/cli_agent_orchestrator/hooks/registration.py - src/cli_agent_orchestrator/hooks/*.sh - terminal_service hook registration call site - api/main.py install_hooks() call - cleanup_service stale-hook-flag sweep - docs/memory.md auto-save section Side effect: 14 CodeQL findings on hooks/registration.py vaporise with the file. Resolves the path-injection warnings flagged on PR #179. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@patricka3125

…245) * feat(memory): add persistent agent memory system (Phase 1) - File-based wiki storage under ~/.aws/cli-agent-orchestrator/memory/ - 4 scopes: global, project (cwd-hashed), session, agent - MCP tools: memory_store, memory_recall, memory_forget - CLI commands: cao memory list/show/delete/clear - Auto-inject <cao-memory> context block on first message - Auto-save hooks: Claude Code (Stop + PreCompact), Kiro CLI (agentSpawn + userPromptSubmit) - Tiered retention with background cleanup - fcntl.flock() for concurrent-write safety - 1296 tests passing * style: apply black formatter CI black --check flagged 5 files. Run black to align with the formatter version pinned in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(memory): remove hooks/registration.py from Phase 1 Phase 1 ships memory primitives (store/recall/forget) only. Hook-driven auto-save is delivered via per-provider plugins in a subsequent PR (per @patricka3125's recommendation in #179 to use Claude Code's plugin system rather than in-line settings.local.json edits). Removes: - src/cli_agent_orchestrator/hooks/registration.py - src/cli_agent_orchestrator/hooks/*.sh - terminal_service hook registration call site - api/main.py install_hooks() call - cleanup_service stale-hook-flag sweep - docs/memory.md auto-save section Side effect: 14 CodeQL findings on hooks/registration.py vaporise with the file. Resolves the path-injection warnings flagged on PR #179. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(memory): address P0/P1/P2/P3 findings from review P0 — silent breakage: - _get_terminal_context() now resolves cwd via the existing tmux working-directory helper, so project-scope storage works in production rather than only in tests that monkeypatch the helper. - send_input() injects memory context for every provider, including Kiro (the default). The original Phase 1 commit skipped Kiro expecting the AgentSpawn hook to handle it; that hook lands in a later plugin-migration PR, so inline injection is the correct Phase 1 path for now. P1 — correctness gaps: - Session and agent scopes nest scope_id into wiki paths, so two sessions or two agent profiles can no longer collide on the same key. Index entries follow the same nested shape. - Retention is keyed on scope (global/agent never expire; project 90d; session 14d) rather than memory_type, with user/feedback as a scope-independent override. Matches docs/memory.md. - store(scope="project") raises when no working directory is resolvable, instead of silently mixing project memories into the global container. P2 — cheap correctness: - tags: regex no longer truncates hyphens, so tags like ci-cd parse intact. - forget() passes the current timestamp into the index header so removals don't blank the  line. - updated_at serialises via strftime("%Y-%m-%dT%H:%M:%SZ"), avoiding the invalid "+00:00Z" form for tz-aware datetimes. - store() returns the action ("created" or "updated") on the Memory model so the MCP layer no longer infers it from second-granularity timestamps. - CLI key validation enforces the same 60-char ceiling as MemoryService._sanitize_key. - cleanup_expired_memories() offloads sync glob/parse work to asyncio.to_thread so it doesn't block the FastAPI event loop on startup. P3 — process: - Added a production-path test for _get_terminal_context() that doesn't monkeypatch the function itself, so a P0-A regression surfaces immediately. - docs/memory.md now matches the scope-keyed retention semantics. 67 tests pass; black and isort clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(memory): address Copilot follow-up comments on 5af8217 Six findings from the second Copilot pass: rewrite wiki header on upsert so recall() sees fresh type/tags; normalize tag whitespace so the index regex still matches; offload forget() in cleanup via asyncio.to_thread; add success:True to memory_recall envelope; fix misleading container comment and storage-layout doc diagram. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Potential fix for pull request finding fix as suggested Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(memory): address Copilot review findings * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(memory): isolate recall by session/agent scope_id --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: haofeif <56006724+haofeif@users.noreply.github.com>

github-advanced-security AI found potential problems Apr 17, 2026

View reviewed changes

fix registartion.py issue

7e24834

github-advanced-security AI found potential problems Apr 17, 2026

View reviewed changes

Comment thread src/cli_agent_orchestrator/hooks/registration.py Fixed

fix: resolve CI failures — CodeQL path injection, formatting, asyncio…

c928e22

… compatibility

github-advanced-security AI found potential problems Apr 17, 2026

View reviewed changes

Comment thread src/cli_agent_orchestrator/hooks/registration.py Fixed

fix the registration

23f3359

github-advanced-security AI found potential problems Apr 17, 2026

View reviewed changes

Comment thread src/cli_agent_orchestrator/hooks/registration.py Fixed

fix the registration

206f744

patricka3125 reviewed Apr 17, 2026

View reviewed changes

haofeif added the enhancement New feature or request label Apr 17, 2026

patricka3125 reviewed Apr 17, 2026

View reviewed changes

haofeif mentioned this pull request Apr 22, 2026

docs(changelog): correct 2.1.0 entry — remove unmerged feature, fix refs #197

Merged

fanhongy and others added 7 commits April 25, 2026 16:19

wip: phase 2.5 snapshot before pypi work

364ab8f

Merge origin/main (Phase A of plugin migration)

d16f0e5

Pulls in PR #172 plugin system and PR #193 opencode_cli so later phases can build plugins instead of provider hooks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Phase B: add cao-memory-claude-code plugin

e5ce782

Injects the CAO memory context into <cwd>/.claude/CLAUDE.md on post_create_terminal. Path-validates the target, replaces any prior block, logs-not-raises on every error path. 7 tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Phase D: remove Codex stop hook

844be0c

Accepted regression matching CC + Kiro: periodic save-memory reminder dropped. Q/Gemini/Kimi/Copilot/OpenCode never had hooks, so Codex joins them on BaseProvider's no-op default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Phase B+E: bundle memory plugins, retire hook infra

a14b756

Move CC/Kiro memory plugins to src/.../plugins/builtin/ with cao.plugins entry points. Delete hooks/ dir and register_hooks overrides; BaseProvider no-op stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fanhongy and others added 3 commits May 13, 2026 20:56

chore: untrack aidlc-docs from git

d716b22

AIDLC working docs are internal team artefacts, not project deliverables. Remove from index and add to .gitignore to keep them out of PR diffs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

haofeif requested a review from Copilot May 19, 2026 00:15

Copilot AI reviewed May 19, 2026

fanhongy marked this pull request as draft May 19, 2026 04:33

fanhongy mentioned this pull request May 19, 2026

feat(memory): persistent agent memory system (Phase 1) — foundation #245

Merged

5 tasks

		### Claude Code

		Registered in `.claude/settings.local.json` (in the working directory) at terminal creation:

		Kiro CLI: Memory injection happens via the `agentSpawn` hook registered at `~/.kiro/agents/{profile}.json`, which fires before the first user message. CAO does not double-inject for Kiro.

		## Auto-Save Hooks

Conversation

fanhongy commented Apr 17, 2026

Summary

Changed files

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patricka3125 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

fanhongy Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

patricka3125 Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patricka3125 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

patricka3125 Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patricka3125 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

fanhongy Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

patricka3125 commented Apr 17, 2026

Uh oh!

patricka3125 Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patricka3125 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

patricka3125 Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Why

Proposed shape

Tradeoff worth naming

Net

Uh oh!

haofeif commented Apr 18, 2026

PR #179 — Phase 1 Scoped Review

TL;DR

Phase 1 deliverables

§Phase 1 — Memory creation

§Phase 1 — Hook-triggered self-save

§Phase 1 — Context delivery

§Phase 1 — Storage setup

§Phase 1 Success Criteria — verdict per item

Summary scorecard

Phase 1 comments — by severity

Must fix for Phase 1 merge

Should fix before merge

Nice to fix — low risk items worth noting

Testing — Phase 1 gaps

Net — Phase 1

Uh oh!

fanhongy commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

patricka3125 Apr 17, 2026 •

edited

Loading

patricka3125 Apr 17, 2026 •

edited

Loading

patricka3125 Apr 17, 2026 •

edited

Loading

patricka3125 Apr 17, 2026 •

edited

Loading