Add governed memory model, provider boundaries, diagnostics, and CLI validation#1145
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add safe log-level diagnostics for memory commands and document how to enable them. Sync memory governance guidance into the Squad agent template mirrors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a bounded real CLI experiment runner that isolates COPILOT_HOME per repository and variant, captures command/stdio/log artifacts, and records measured memory diagnostics evidence without faking in-turn command use. Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update compatibility expectations for the added memory governance tools and wrap the memory help entry so the terminal-width UX gate passes.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the extra memory action line so the help output remains within the speed-gate line budget while preserving the shorter memory command description.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a deterministic benchmark that compares baseline full-memory context loading against governed retrieval for context size, precision, recall, decision consistency, and stale/unsafe memory exclusion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…prototype providers Implements issue #3287: prove external/local provider seams can plug into the memory governance model without impersonating Copilot Memory. Changes: - Add MemoryProviderStatus and MemoryProvider interfaces - Add MemPalaceMemoryProvider: in-memory spatial store, loci-based, accepts LOCAL / DECISION / POLICY - Add IndexServerMemoryProvider: in-memory knowledge catalog, topic-based, accepts LOCAL / DECISION / POLICY - LocalMemoryStoreOptions.registeredProviders: optional provider list - LocalMemoryStore.write(): routes governed writes to matching providers AFTER local write; FORBIDDEN/TRANSIENT/COPILOT_MEMORY never reach providers - LocalMemoryStore.search(): merges and deduplicates provider results - LocalMemoryStore.providerStatus(): includes registered provider statuses - Audit records for provider replication contain only id/class/title/path (no raw content or query text) - All new types exported from sdk barrel via existing export * Tests (test/memory-providers.test.ts, 19 tests): - MemPalaceMemoryProvider: status, write/search LOCAL and DECISION, delete, empty search - IndexServerMemoryProvider: status, write/search POLICY and DECISION, delete, empty search - LocalMemoryStore routing: LOCAL/POLICY writes reach both providers, search merges provider hits, FORBIDDEN/TRANSIENT rejected before providers, COPILOT_MEMORY and provider=copilot still fail closed, providerStatus reflects registered providers, audit records are safe
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed the review feedback from Brady's screenshot in commit
Validation: focused memory tests, SDK build, lint, memory-value benchmark, and PR CI are passing. |
Code Review — Memory Governance ProviderOverall: Well-designed governance model with solid test coverage. The architecture cleanly separates classification, storage, audit, and provider concerns. A few issues worth addressing before merge: 🔴 CriticalRace condition in concurrent index updates ( The memory index uses read-modify-write without locking or atomic writes. Concurrent agents/CLI commands can cause lost entries when writing simultaneously. The proposal doc acknowledges this as future work, but it's a real data-loss risk in multi-agent scenarios. Fix: Write to temp file + atomic rename at minimum; consider file-based locking. 🟡 Medium
🟢 Low / Informational
✅ Strengths
|
…bstone ordering - readIndex() now throws with a .corrupt backup instead of silently resetting to [] when index.json contains invalid JSON or a non-array root. - writeIndex() now writes to index.json.tmp then renames into place, preventing partial writes from corrupting the index on crash. - Added backupCorruptIndex() private helper used by readIndex(). - Regression tests added (30 total, was 23): * concurrent writes: 2 writers and 10 writers both produce all entries * corrupt index (invalid JSON, non-array): throws with message + .corrupt file * delete tombstone ordering: tombstone present after delete, source gone * tombstone survives source-delete failure via faulting proxy
Fix summary — commit 7458b81Picking up from commit 1. Race condition in concurrent index updates (FIXED)Added an async serialization mutex to
Regression tests (
2. Corrupted index silently resets to
|
Summary
This PR represents the full memory governance/provider work on
squad/memory-governance-provider, not just diagnostics config.It adds:
TRANSIENT,LOCAL,DECISION,POLICY,COPILOT_MEMORY,FORBIDDEN[ALWAYS],[ON-DEMAND],[ARCHIVE],[NEVER]squad memory classify|write|search|audit|providerprovider=copilotreported unavailable without a concrete callable API, host-injected adapter opt-in and fail-closed.squad/config.jsonsupportCOPILOT_HOMEThis work is based on Tamir Dresher's blog post, The Ship's Computer Has a Memory Problem — Designing Memory for AI Agent Squads. The core idea is that agent memory is runtime state with lifecycle, governance, and context-budget implications — not just more Markdown in the repo or more text in the prompt.
Short guide: how to use it
Initialize or upgrade a Squad repo so local governance files are scaffolded:
squad init # or, for an existing repo: squad upgradeBy default, governance is local-only:
.squad/memory/.squad/memory remains the source of truthUse the CLI bridge:
Diagnostics and logs
For one command:
squad memory search --query "Vitest" --log-level info squad memory provider --verboseFor project-level diagnostics, add this to
.squad/config.json:{ "memory": { "logLevel": "info" } }Supported levels:
Precedence:
--log-level/--verboseSQUAD_MEMORY_LOG_LEVEL.squad/config.jsonmemory.logLevelnoneDiagnostics go to stderr; JSON output remains on stdout. Diagnostics include safe metadata such as command name, provider, paths, result counts, load guidance, and timing. They intentionally do not print raw memory content or raw search text.
How to measure memory value
Run the deterministic benchmark:
The benchmark compares naive baseline behavior (
load all memory into context) against governed retrieval (ALWAYS+ relevantON-DEMAND, excludingARCHIVE,NEVER, deleted, or superseded facts). It measures context payload, estimated token pressure, precision, recall, decision consistency, and stale/unsafe-memory avoidance.Latest local result from this branch:
Validation evidence
Latest branch validation:
Results:
56bd3136: passed (run 26141373561)Real CLI A/B evidence
The branch includes a real Copilot CLI paired A/B harness and prior reruns across two local Squad repos:
This proves the harness, isolated
COPILOT_HOMEruns, memory command invocation diagnostics, and baseline-vs-governance signal separation.Current verdict
Stronger than before, but still honest.
This PR now proves:
This PR still does not prove:
The right claim is: this makes the blog-post promise measurable and demonstrates the local governed-memory mechanism can reduce context size and improve memory selection quality in deterministic tests. Full live-agent longitudinal proof still requires repeated real-world paired runs once a callable provider/host adapter exists.