Add governed memory model, provider boundaries, diagnostics, and CLI validation by tamirdresher · Pull Request #1145 · bradygaster/squad

tamirdresher · 2026-05-20T03:27:16Z

Summary

This PR represents the full memory governance/provider work on squad/memory-governance-provider, not just diagnostics config.

It adds:

Governed memory classes: TRANSIENT, LOCAL, DECISION, POLICY, COPILOT_MEMORY, FORBIDDEN
Load guidance: [ALWAYS], [ON-DEMAND], [ARCHIVE], [NEVER]
Local memory store with classification, write/search/audit, promotion, deletion, tombstones, and safe audit records
CLI bridge: squad memory classify|write|search|audit|provider
Honest provider boundaries: local default, provider=copilot reported unavailable without a concrete callable API, host-injected adapter opt-in and fail-closed
Safe diagnostics to stderr with .squad/config.json support
Real Copilot CLI A/B harness with isolated COPILOT_HOME
Deterministic memory-value benchmark for context size, retrieval quality, decision consistency, and stale/unsafe memory exclusion
Tests, docs, proposals, template guidance, and changeset coverage

This work is based on Tamir Dresher's blog post, The Ship's Computer Has a Memory Problem — Designing Memory for AI Agent Squads. The core idea is that agent memory is runtime state with lifecycle, governance, and context-budget implications — not just more Markdown in the repo or more text in the prompt.

Short guide: how to use it

Initialize or upgrade a Squad repo so local governance files are scaffolded:

squad init
# or, for an existing repo:
squad upgrade

By default, governance is local-only:

memory config lives under .squad/memory/
audit records are written locally
local .squad/ memory remains the source of truth
Copilot Memory provider integration is not assumed or faked

Use the CLI bridge:

squad memory classify "Always run tests before merge"

squad memory write \
  --content "Use Vitest for SDK regression tests" \
  --class DECISION \
  --author scribe \
  --approved

squad memory search --query "Vitest"
squad memory audit
squad memory provider

Diagnostics and logs

For one command:

squad memory search --query "Vitest" --log-level info
squad memory provider --verbose

For project-level diagnostics, add this to .squad/config.json:

{
  "memory": {
    "logLevel": "info"
  }
}

Supported levels:

none | error | info | debug

Precedence:

CLI --log-level / --verbose
SQUAD_MEMORY_LOG_LEVEL
.squad/config.json memory.logLevel
default none

Diagnostics go to stderr; JSON output remains on stdout. Diagnostics include safe metadata such as command name, provider, paths, result counts, load guidance, and timing. They intentionally do not print raw memory content or raw search text.

How to measure memory value

Run the deterministic benchmark:

npm run build -w packages/squad-sdk
npm run experiment:memory-value
npm run experiment:memory-value -- --json

The benchmark compares naive baseline behavior (load all memory into context) against governed retrieval (ALWAYS + relevant ON-DEMAND, excluding ARCHIVE, NEVER, deleted, or superseded facts). It measures context payload, estimated token pressure, precision, recall, decision consistency, and stale/unsafe-memory avoidance.

Latest local result from this branch:

Verdict: PASS
Context bytes: baseline=3540, governed=1601
Estimated tokens: baseline=885, governed=401
Context reduction: 54.8%
Precision: baseline=0.16, governed=0.41
Recall: baseline=1.00, governed=1.00
Decision consistency: baseline=0.50, governed=1.00
Stale/unsafe facts avoided: 12/12

Validation evidence

Latest branch validation:

npm run build -w packages/squad-sdk
npm run lint
npm run test -- test/memory-value-benchmark.test.ts test/memory-governance.test.ts test/real-cli-ab.test.ts test/bench-runner.test.ts
npm run experiment:memory-value
npm run experiment:real-cli-ab -- --repo <repo> --variants memory-governance --dry-run

Results:

SDK build: passed
TypeScript lint: passed
Related memory/harness tests: 41 passed
Deterministic memory-value benchmark: PASS
Real CLI A/B dry-run plan/artifact path generation: passed
Previous full GitHub Actions Squad CI before this benchmark commit: passed (run 26140594530)
Current GitHub Actions Squad CI for head 56bd3136: passed (run 26141373561)

Real CLI A/B evidence

The branch includes a real Copilot CLI paired A/B harness and prior reruns across two local Squad repos:

ADC Squad runner demo:
- baseline: 0 memory diagnostic events
- memory-governance variant: 10 memory diagnostic events
Squad repo:
- baseline: 0 memory diagnostic events
- memory-governance variant: 9 memory diagnostic events

This proves the harness, isolated COPILOT_HOME runs, memory command invocation diagnostics, and baseline-vs-governance signal separation.

Current verdict

Stronger than before, but still honest.

This PR now proves:

governed local memory operations work
classification/isolation behavior is enforced
forbidden/transient content is rejected or kept out of durable memory
provider status is honest
host-injected provider mode fails closed without a real host client
diagnostics are configurable and safe
the real CLI A/B harness can isolate runs and capture evidence
deterministic governed retrieval can reduce context pressure while preserving recall and improving precision/decision consistency in seeded memory tasks

This PR still does not prove:

real Copilot Memory provider write/search/delete integration
statistically significant live-agent quality gains from external semantic memory
long-run production improvement across many repositories and real issues

The right claim is: this makes the blog-post promise measurable and demonstrates the local governed-memory mechanism can reduce context size and improve memory selection quality in deterministic tests. Full live-agent longitudinal proof still requires repeated real-world paired runs once a callable provider/host adapter exists.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add safe log-level diagnostics for memory commands and document how to enable them. Sync memory governance guidance into the Squad agent template mirrors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add a bounded real CLI experiment runner that isolates COPILOT_HOME per repository and variant, captures command/stdio/log artifacts, and records measured memory diagnostics evidence without faking in-turn command use. Co-authored-by: GitHub Copilot <copilot@github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update compatibility expectations for the added memory governance tools and wrap the memory help entry so the terminal-width UX gate passes.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove the extra memory action line so the help output remains within the speed-gate line budget while preserving the shorter memory command description.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add a deterministic benchmark that compares baseline full-memory context loading against governed retrieval for context size, precision, recall, decision consistency, and stale/unsafe memory exclusion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…prototype providers Implements issue #3287: prove external/local provider seams can plug into the memory governance model without impersonating Copilot Memory. Changes: - Add MemoryProviderStatus and MemoryProvider interfaces - Add MemPalaceMemoryProvider: in-memory spatial store, loci-based, accepts LOCAL / DECISION / POLICY - Add IndexServerMemoryProvider: in-memory knowledge catalog, topic-based, accepts LOCAL / DECISION / POLICY - LocalMemoryStoreOptions.registeredProviders: optional provider list - LocalMemoryStore.write(): routes governed writes to matching providers AFTER local write; FORBIDDEN/TRANSIENT/COPILOT_MEMORY never reach providers - LocalMemoryStore.search(): merges and deduplicates provider results - LocalMemoryStore.providerStatus(): includes registered provider statuses - Audit records for provider replication contain only id/class/title/path (no raw content or query text) - All new types exported from sdk barrel via existing export * Tests (test/memory-providers.test.ts, 19 tests): - MemPalaceMemoryProvider: status, write/search LOCAL and DECISION, delete, empty search - IndexServerMemoryProvider: status, write/search POLICY and DECISION, delete, empty search - LocalMemoryStore routing: LOCAL/POLICY writes reach both providers, search merges provider hits, FORBIDDEN/TRANSIENT rejected before providers, COPILOT_MEMORY and provider=copilot still fail closed, providerStatus reflects registered providers, audit records are safe

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tamirdresher · 2026-05-20T21:51:25Z

Addressed the review feedback from Brady's screenshot in commit 0139af50:

Removed raw secret-shaped fixture content from the deterministic memory-value benchmark and added a regression assertion.
Added bounded audit log rotation via policy.auditMaxBytes / policy.auditMaxArchives.
Replaced silent registered-provider write/search failures with safe provider-error audit entries that omit raw error text and query/content.
Preserved provider provenance/class/load guidance/score in registered provider search results instead of normalizing them back to local results.
Bounded the in-memory MemPalace/IndexServer prototype providers and sanitized provider path segments.
Hardened local index path resolution and tightened promote successor-id handling.

Validation: focused memory tests, SDK build, lint, memory-value benchmark, and PR CI are passing.

bradygaster · 2026-05-20T23:04:26Z

Code Review — Memory Governance Provider

Overall: Well-designed governance model with solid test coverage. The architecture cleanly separates classification, storage, audit, and provider concerns. A few issues worth addressing before merge:

🔴 Critical

Race condition in concurrent index updates (memory/index.ts)

The memory index uses read-modify-write without locking or atomic writes. Concurrent agents/CLI commands can cause lost entries when writing simultaneously. The proposal doc acknowledges this as future work, but it's a real data-loss risk in multi-agent scenarios.

Fix: Write to temp file + atomic rename at minimum; consider file-based locking.

🟡 Medium

Partial failure in delete() leaves inconsistent state — Tombstone is written after the index is updated and source file deleted. If tombstone write fails, state is irrecoverable. Reorder: write tombstone → update index → delete source.
Provider search hardcodes class: 'LOCAL' — Provider results lose their actual classification. The CopilotMemoryProviderSearchResult interface lacks a class field, so all provider results appear as LOCAL regardless of their true class.
Silent provider write failures — Empty catch block on provider replication means auth/network failures are invisible. Suggest writing a safe audit entry (no content) on failure.
Corrupted index silently resets to empty — If index.json is malformed, readIndex() returns [] with no backup or notification. Could silently lose an entire memory store.

🟢 Low / Informational

Unbounded audit log growth — audit.jsonl is append-only with no rotation. Long-running projects will see unbounded growth.
Promote path has weak type safety — MemoryWriteResult.id is typed as optional, but promote dereferences it unconditionally. Runtime is safe (implementation always sets it), but type should enforce the invariant.

✅ Strengths

Clean classification taxonomy (TRANSIENT → FORBIDDEN) with enforcement
Honest provider boundaries — Copilot Memory reported unavailable without a real API
Fail-closed host-injected adapter pattern
Deterministic benchmark proving 54.8% context reduction with maintained recall
Good test coverage (753 lines governance tests, 412 lines provider tests)
Safe diagnostics (no raw content in stderr)

…bstone ordering - readIndex() now throws with a .corrupt backup instead of silently resetting to [] when index.json contains invalid JSON or a non-array root. - writeIndex() now writes to index.json.tmp then renames into place, preventing partial writes from corrupting the index on crash. - Added backupCorruptIndex() private helper used by readIndex(). - Regression tests added (30 total, was 23): * concurrent writes: 2 writers and 10 writers both produce all entries * corrupt index (invalid JSON, non-array): throws with message + .corrupt file * delete tombstone ordering: tombstone present after delete, source gone * tombstone survives source-delete failure via faulting proxy

tamirdresher · 2026-05-21T05:51:29Z

Fix summary — commit `7458b81`

Picking up from commit 0139af50 (which addressed provider search class, silent write failures, audit log rotation, and promote id invariant), this commit resolves the remaining three open items from the Brady PR review:

1. Race condition in concurrent index updates (FIXED)

Added an async serialization mutex to LocalMemoryStore:

New private field indexLockTail: Promise<unknown> (initialized to Promise.resolve())
New private method withIndexLock<T>(fn: () => Promise<T>): Promise<T> — each caller chains onto the previous tail and awaits it before running its critical section
All readIndex() → mutate → writeIndex() sequences in write(), promote(), and delete() are now wrapped in withIndexLock

Regression tests (LocalMemoryStore — concurrent writes and index integrity):

2 concurrent write() calls → both IDs in persisted index
10 concurrent write() calls → all 10 IDs in persisted index

2. Corrupted index silently resets to `[]` (FIXED)

readIndex() previously caught any JSON parse error and returned [], causing silent data loss.
Now:

On invalid JSON or non-array root: writes the raw bytes to index.json.corrupt as a recovery aid, then throws a descriptive Error so the caller surfaces the problem instead of silently overwriting the index
backupCorruptIndex() private helper wraps the backup write; failure to write the backup is suppressed so callers always see the original error

Regression tests (LocalMemoryStore — corrupted index.json handling):

Invalid JSON → throws /Memory index is corrupt/; .corrupt backup file written with original content
Non-array root → same throw

3. Delete tombstone ordering (FIXED)

Original code deleted the source file before writing the tombstone and updating the index. A crash between those steps left the entry unrecoverable.

Fixed order for local entries: tombstone write → index update → source file delete
For COPILOT_MEMORY entries: external provider delete is gated first (returns false on failure); then tombstone → index (source is external).

Regression tests (LocalMemoryStore — delete tombstone ordering):

After a successful delete(): tombstone exists, source gone, audit contains delete record
When source delete fails (faulting storage proxy): call rejects, tombstone survives

4. Atomic `writeIndex()` (FIXED)

writeIndex() previously wrote directly to index.json, leaving a window where a crash mid-write produces a partial/corrupt file.
Now: writes to index.json.tmp → storage.rename(tmp, indexPath) (wraps fs.rename, atomic on POSIX, near-atomic on Windows same-volume).

Test results

All 30 tests pass (23 pre-existing + 7 new regression tests). SDK build clean (tsc zero errors).

Copilot and others added 4 commits May 19, 2026 18:49

feat(memory): add governance provider model

70f5d4f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(memory): add CLI diagnostics and squad guidance

940bf05

Add safe log-level diagnostics for memory commands and document how to enable them. Sync memory governance guidance into the Squad agent template mirrors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(memory): support config-driven diagnostics

244554c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tamirdresher changed the title ~~Add memory governance diagnostics config~~ Add governed memory model, provider boundaries, diagnostics, and CLI validation May 20, 2026

Copilot and others added 5 commits May 20, 2026 06:57

fix(memory): satisfy CI gates for memory tools

e0f3624

Update compatibility expectations for the added memory governance tools and wrap the memory help entry so the terminal-width UX gate passes.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(cli): keep help under speed gate

c5c6a8c

Remove the extra memory action line so the help output remains within the speed-gate line budget while preserving the shorter memory command description.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(memory): address provider review safety gaps

0139af5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tamirdresher marked this pull request as ready for review May 21, 2026 17:57

Copilot AI review requested due to automatic review settings May 21, 2026 17:57

Copilot started reviewing on behalf of tamirdresher May 21, 2026 17:58 View session

bradygaster merged commit c6148e7 into bradygaster:dev May 21, 2026
6 checks passed

tamirdresher review requested due to automatic review settings May 21, 2026 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add governed memory model, provider boundaries, diagnostics, and CLI validation#1145

Add governed memory model, provider boundaries, diagnostics, and CLI validation#1145
bradygaster merged 10 commits into
bradygaster:devfrom
tamirdresher:squad/memory-governance-provider

tamirdresher commented May 20, 2026 •

edited

Loading

Uh oh!

tamirdresher commented May 20, 2026

Uh oh!

bradygaster commented May 20, 2026

Uh oh!

tamirdresher commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tamirdresher commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Short guide: how to use it

Diagnostics and logs

How to measure memory value

Validation evidence

Real CLI A/B evidence

Current verdict

Uh oh!

tamirdresher commented May 20, 2026

Uh oh!

bradygaster commented May 20, 2026

Code Review — Memory Governance Provider

🔴 Critical

🟡 Medium

🟢 Low / Informational

✅ Strengths

Uh oh!

tamirdresher commented May 21, 2026

Fix summary — commit 7458b81

1. Race condition in concurrent index updates (FIXED)

2. Corrupted index silently resets to [] (FIXED)

3. Delete tombstone ordering (FIXED)

4. Atomic writeIndex() (FIXED)

Test results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tamirdresher commented May 20, 2026 •

edited

Loading

Fix summary — commit `7458b81`

2. Corrupted index silently resets to `[]` (FIXED)

4. Atomic `writeIndex()` (FIXED)