refactor(memory): remove Domain concept in favor of audience isolation by Aaronontheweb · Pull Request #588 · netclaw-dev/netclaw

Aaronontheweb · 2026-04-10T23:10:05Z

Summary

Deletes the half-implemented Domain concept from the memory system. SessionId.ToMemoryDomain() unconditionally returned the same constant regardless of session ID, so per-project scoping was never a real isolation mechanism — just a ranker contaminant that made recall pollution (#582) worse. Audience (Public ⊂ Team ⊂ Personal) is now the sole isolation axis; lexical/facet/anchor matches are the only relevance signals.

Closes #584.

Changes

Schema migration: idempotent DROP COLUMN domain on memory_anchors, memory_documents, memory_records, memory_edges. Drops and recreates policy indexes without the domain column.
Delete Domain field from SQLite record types, MemoryCheckpointPayload, MemoryCheckpointCandidate, ObservedMemoryCheckpointPayload, SubAgentFinding, AcceptedSubAgentFinding, AutomaticRecallItem.
Delete ToMemoryDomain() and SecurityPolicyDefaults.DefaultMemoryDomain.
Simplify signatures: MemoryPolicyGates.Evaluate/Accept, MemoryPolicyEvaluator.EvaluateWrite, MemoryCurationPipeline.BuildFingerprint, MemoryCurationActor.StartEvaluation, SessionMemoryObserverActor.RunDistillationAsync, SQLiteMemoryStore.CreateDefaultAnchor/SearchAutoRecallDocumentsAsync/FindFuzzyAnchorMatchesAsync/FindCandidatesByContentAsync all lose the domain parameter.
Rename SearchAcrossDomainsByPlanAsync → SearchByPlanAsync.
Delete now-meaningless tests: Domain_is_not_a_scoring_signal, Cross_domain_candidate_not_excluded, Review_defers_domain_mismatch.
OpenSpec: update netclaw-agent-memory/spec.md durable memory policy envelope requirement to reference audience instead of domain.

Net: 37 files changed, 194 insertions, 404 deletions.

Test plan

dotnet build — clean, 0 warnings, 0 errors
dotnet test — all 1,964 tests pass (837 Actors + 428 Daemon + 406 Cli + 139 Configuration + 117 Security + 32 Search + 5 MemoryRetrievalPoC)
dotnet slopwatch analyze — 0 issues
CI green

Last remaining callers of SecurityPolicyDefaults.InferLegacyBoundaryFromDomain were in MemoryCurationPipeline — both at the sentinel-upgrade site when a payload arrives without an explicit boundary. The function always returned TrustedInstanceBoundary for the domains the system actually produces, so the two call sites get inlined to the constant and the function itself deletes cleanly. No new dead code remains in the boundary-inference path.

The per-session `Domain` string (e.g. "project:default") was a half-implemented isolation mechanism — `SessionId.ToMemoryDomain()` unconditionally returned the same constant, so it was a coin flip rather than a real scoping signal. Worse, domain affinity contributed +5 to the ranker's composite score and caused unrelated noise documents to outrank signal on off-topic queries (#582). Audience (Public ⊂ Team ⊂ Personal) is now the sole isolation axis. Lexical, facet, and anchor matches remain the only relevance signals. Changes: - Delete `Domain` field from SQLiteMemoryDocument, SQLiteMemoryAnchor, SQLiteMemoryHydratedItem, SQLiteMemorySearchResult, SQLiteMemoryCurationOperation, AutomaticRecallItem, MemoryCheckpointPayload, MemoryCheckpointCandidate, ObservedMemoryCheckpointPayload, SubAgentFinding, AcceptedSubAgentFinding - Delete `ToMemoryDomain` and `DefaultMemoryDomain` constant - Drop `domain` column from memory_anchors, memory_documents, memory_records, memory_edges via idempotent DROP COLUMN migration - Rename `SearchAcrossDomainsByPlanAsync` → `SearchByPlanAsync` - Update OpenSpec `netclaw-agent-memory/spec.md` policy envelope requirement - Strip `Domain_is_not_a_scoring_signal`, `Cross_domain_candidate_not_excluded`, `Review_defers_domain_mismatch` tests that no longer have a referent

…NULL Two independent regressions surfaced together in production Slack sessions D0AC6CKBK5K/1776051715.090089 and D0AC6CKBK5K/1776075016.334849. 1. Tool-loop acknowledgement loop (PR #618 regression). SessionMessageAssembler emitted the volatile turn-context tail (memory recall, current time, working-context) as a ChatRole.User message at the end of the assembled list. During a tool loop, Qwen3's ChatML template read the trailing user-role block as a fresh user turn, so the model restarted its assistant response on every iteration, scanned back for the last real user content, and re-emitted "You're right - I had that backwards" before each tool call until context hit 262144/262144 and was force-compacted. Flip the role back to System; keep the placement at the end of the list so #618's cache stability win is preserved (llama.cpp KV cache is byte-level prefix matching, role tag at the end does not affect prefix stability). Update 4 existing SessionMessageAssembler tests for the role flip and add a new Volatile_tail_does_not_create_fake_user_turn_after_tool_result regression test that builds a mid-tool-loop history and asserts the trailing message is System-role with no User-role after the last Tool/Assistant message. 2. Legacy memory_anchors.domain NOT NULL constraint. PR #588 removed the Domain concept from the in-code schema but intentionally shipped no migration. CREATE TABLE IF NOT EXISTS is a no-op on existing databases, so production DBs still had `domain TEXT NOT NULL` on memory_anchors, memory_documents, memory_records, and memory_edges. Every new-anchor INSERT has been failing with SQLite Error 19 since the refactor, blocking memory curation entirely - no new memories have been written since 2026-04-12 20:37 despite successful distillation proposals. Add scripts/repair-memory-schema.sql, a one-off rename/rebuild/copy repair that drops the legacy column from all four tables while preserving rows. FTS5 virtual tables are standalone (no content= mode) and do not need touching. Also handle the AcceptedDistillationProposalsRecorded dead letter: the observer correctly replies to LlmSessionActor, but the existing handler only lived inside Passivating(). Add no-op Command<> registrations in Ready(), Processing(), and Compacting() so the informational reply does not land in DeadLetters during normal session states. Capture Sender into a local in SessionMemoryObserverActor.HandleRecordAcceptedDistillationProposals before calling Persist - the standard Akka.NET defense against Sender being overwritten by an interleaved message before the persist callback fires.

…NULL (#634) * fix(sessions): recover from volatile-tail loop and legacy domain NOT NULL Two independent regressions surfaced together in production Slack sessions D0AC6CKBK5K/1776051715.090089 and D0AC6CKBK5K/1776075016.334849. 1. Tool-loop acknowledgement loop (PR #618 regression). SessionMessageAssembler emitted the volatile turn-context tail (memory recall, current time, working-context) as a ChatRole.User message at the end of the assembled list. During a tool loop, Qwen3's ChatML template read the trailing user-role block as a fresh user turn, so the model restarted its assistant response on every iteration, scanned back for the last real user content, and re-emitted "You're right - I had that backwards" before each tool call until context hit 262144/262144 and was force-compacted. Flip the role back to System; keep the placement at the end of the list so #618's cache stability win is preserved (llama.cpp KV cache is byte-level prefix matching, role tag at the end does not affect prefix stability). Update 4 existing SessionMessageAssembler tests for the role flip and add a new Volatile_tail_does_not_create_fake_user_turn_after_tool_result regression test that builds a mid-tool-loop history and asserts the trailing message is System-role with no User-role after the last Tool/Assistant message. 2. Legacy memory_anchors.domain NOT NULL constraint. PR #588 removed the Domain concept from the in-code schema but intentionally shipped no migration. CREATE TABLE IF NOT EXISTS is a no-op on existing databases, so production DBs still had `domain TEXT NOT NULL` on memory_anchors, memory_documents, memory_records, and memory_edges. Every new-anchor INSERT has been failing with SQLite Error 19 since the refactor, blocking memory curation entirely - no new memories have been written since 2026-04-12 20:37 despite successful distillation proposals. Add scripts/repair-memory-schema.sql, a one-off rename/rebuild/copy repair that drops the legacy column from all four tables while preserving rows. FTS5 virtual tables are standalone (no content= mode) and do not need touching. Also handle the AcceptedDistillationProposalsRecorded dead letter: the observer correctly replies to LlmSessionActor, but the existing handler only lived inside Passivating(). Add no-op Command<> registrations in Ready(), Processing(), and Compacting() so the informational reply does not land in DeadLetters during normal session states. Capture Sender into a local in SessionMemoryObserverActor.HandleRecordAcceptedDistillationProposals before calling Persist - the standard Akka.NET defense against Sender being overwritten by an interleaved message before the persist callback fires. * refactor(sessions): collapse distillation ack no-op and trim test noise Review cleanup on top of the prior volatile-tail and NOT-NULL fixes. - LlmSessionActor: extract the three identical no-op `Command<AcceptedDistillationProposalsRecorded>(_ => { })` registrations from Ready/Processing/Compacting into a single `CommandDistillationAckNoOp()` helper with one canonical comment explaining why the non-passivation states need to swallow the observer's informational reply. Net: 3 copies of the handler and 3 near-duplicate comment blocks collapse to 1 helper + 1 comment + 3 single-line call sites. - SessionMessageAssemblerTests: * Recall_is_not_in_leading_system_prefix_even_when_resolved: drop the StringBuilder in favour of the per-message / break-on-non-System pattern already established by AssertNoVolatileContentInSystemPrefix. Same invariant, less allocation, same shape as the rest of the file. * Prefix_extends_through_history_when_startup_layers_settled, Volatile_tail_message_is_System_role_at_end_of_list, and Working_context_update_does_not_poison_system_prefix: trim WHAT- narrating comments that the identifiers and assertions already convey. Load-bearing WHY comments on the new regression test, the SessionMessageAssembler XML doc, and the SessionMemoryObserverActor Sender-capture rationale are preserved. No behavioural change. Verified with `dotnet test --filter "FullyQualifiedName~Sessions"` (253/253 passing) and `dotnet slopwatch analyze` (0 issues).

Pre-OSS cleanup. The repair-memory-schema.sql script was a one-off fix for databases created before PR #588 dropped the domain column. No longer needed for the public release.

Aaronontheweb added 2 commits April 10, 2026 22:20

Aaronontheweb merged commit ce6163c into dev Apr 10, 2026
3 checks passed

Aaronontheweb deleted the memory-domain-cleanup branch April 10, 2026 23:16

Aaronontheweb mentioned this pull request Apr 13, 2026

fix(sessions): recover from volatile-tail loop and legacy domain NOT NULL #634

Merged

7 tasks

Aaronontheweb mentioned this pull request May 1, 2026

chore: remove one-off memory schema repair script #845

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(memory): remove Domain concept in favor of audience isolation#588

refactor(memory): remove Domain concept in favor of audience isolation#588
Aaronontheweb merged 2 commits into
devfrom
memory-domain-cleanup

Aaronontheweb commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Apr 10, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant