Skip to content

refactor(memory): remove Domain concept in favor of audience isolation#588

Merged
Aaronontheweb merged 2 commits into
devfrom
memory-domain-cleanup
Apr 10, 2026
Merged

refactor(memory): remove Domain concept in favor of audience isolation#588
Aaronontheweb merged 2 commits into
devfrom
memory-domain-cleanup

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Deletes the half-implemented Domain concept from the memory system. SessionId.ToMemoryDomain() unconditionally returned the same constant regardless of session ID, so per-project scoping was never a real isolation mechanism — just a ranker contaminant that made recall pollution (#582) worse. Audience (Public ⊂ Team ⊂ Personal) is now the sole isolation axis; lexical/facet/anchor matches are the only relevance signals.

Closes #584.

Changes

  • Schema migration: idempotent DROP COLUMN domain on memory_anchors, memory_documents, memory_records, memory_edges. Drops and recreates policy indexes without the domain column.
  • Delete Domain field from SQLite record types, MemoryCheckpointPayload, MemoryCheckpointCandidate, ObservedMemoryCheckpointPayload, SubAgentFinding, AcceptedSubAgentFinding, AutomaticRecallItem.
  • Delete ToMemoryDomain() and SecurityPolicyDefaults.DefaultMemoryDomain.
  • Simplify signatures: MemoryPolicyGates.Evaluate/Accept, MemoryPolicyEvaluator.EvaluateWrite, MemoryCurationPipeline.BuildFingerprint, MemoryCurationActor.StartEvaluation, SessionMemoryObserverActor.RunDistillationAsync, SQLiteMemoryStore.CreateDefaultAnchor/SearchAutoRecallDocumentsAsync/FindFuzzyAnchorMatchesAsync/FindCandidatesByContentAsync all lose the domain parameter.
  • Rename SearchAcrossDomainsByPlanAsyncSearchByPlanAsync.
  • Delete now-meaningless tests: Domain_is_not_a_scoring_signal, Cross_domain_candidate_not_excluded, Review_defers_domain_mismatch.
  • OpenSpec: update netclaw-agent-memory/spec.md durable memory policy envelope requirement to reference audience instead of domain.

Net: 37 files changed, 194 insertions, 404 deletions.

Test plan

  • dotnet build — clean, 0 warnings, 0 errors
  • dotnet test — all 1,964 tests pass (837 Actors + 428 Daemon + 406 Cli + 139 Configuration + 117 Security + 32 Search + 5 MemoryRetrievalPoC)
  • dotnet slopwatch analyze — 0 issues
  • CI green

Last remaining callers of SecurityPolicyDefaults.InferLegacyBoundaryFromDomain
were in MemoryCurationPipeline — both at the sentinel-upgrade site when a
payload arrives without an explicit boundary. The function always returned
TrustedInstanceBoundary for the domains the system actually produces, so
the two call sites get inlined to the constant and the function itself
deletes cleanly.

No new dead code remains in the boundary-inference path.
The per-session `Domain` string (e.g. "project:default") was a half-implemented
isolation mechanism — `SessionId.ToMemoryDomain()` unconditionally returned the
same constant, so it was a coin flip rather than a real scoping signal. Worse,
domain affinity contributed +5 to the ranker's composite score and caused
unrelated noise documents to outrank signal on off-topic queries (#582).

Audience (Public ⊂ Team ⊂ Personal) is now the sole isolation axis. Lexical,
facet, and anchor matches remain the only relevance signals.

Changes:
- Delete `Domain` field from SQLiteMemoryDocument, SQLiteMemoryAnchor,
  SQLiteMemoryHydratedItem, SQLiteMemorySearchResult,
  SQLiteMemoryCurationOperation, AutomaticRecallItem,
  MemoryCheckpointPayload, MemoryCheckpointCandidate,
  ObservedMemoryCheckpointPayload, SubAgentFinding, AcceptedSubAgentFinding
- Delete `ToMemoryDomain` and `DefaultMemoryDomain` constant
- Drop `domain` column from memory_anchors, memory_documents, memory_records,
  memory_edges via idempotent DROP COLUMN migration
- Rename `SearchAcrossDomainsByPlanAsync` → `SearchByPlanAsync`
- Update OpenSpec `netclaw-agent-memory/spec.md` policy envelope requirement
- Strip `Domain_is_not_a_scoring_signal`, `Cross_domain_candidate_not_excluded`,
  `Review_defers_domain_mismatch` tests that no longer have a referent
@Aaronontheweb Aaronontheweb merged commit ce6163c into dev Apr 10, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the memory-domain-cleanup branch April 10, 2026 23:16
Aaronontheweb added a commit that referenced this pull request Apr 13, 2026
…NULL

Two independent regressions surfaced together in production Slack sessions
D0AC6CKBK5K/1776051715.090089 and D0AC6CKBK5K/1776075016.334849.

1. Tool-loop acknowledgement loop (PR #618 regression). SessionMessageAssembler
   emitted the volatile turn-context tail (memory recall, current time,
   working-context) as a ChatRole.User message at the end of the assembled
   list. During a tool loop, Qwen3's ChatML template read the trailing
   user-role block as a fresh user turn, so the model restarted its assistant
   response on every iteration, scanned back for the last real user content,
   and re-emitted "You're right - I had that backwards" before each tool call
   until context hit 262144/262144 and was force-compacted. Flip the role back
   to System; keep the placement at the end of the list so #618's cache
   stability win is preserved (llama.cpp KV cache is byte-level prefix
   matching, role tag at the end does not affect prefix stability). Update 4
   existing SessionMessageAssembler tests for the role flip and add a new
   Volatile_tail_does_not_create_fake_user_turn_after_tool_result regression
   test that builds a mid-tool-loop history and asserts the trailing message
   is System-role with no User-role after the last Tool/Assistant message.

2. Legacy memory_anchors.domain NOT NULL constraint. PR #588 removed the
   Domain concept from the in-code schema but intentionally shipped no
   migration. CREATE TABLE IF NOT EXISTS is a no-op on existing databases,
   so production DBs still had `domain TEXT NOT NULL` on memory_anchors,
   memory_documents, memory_records, and memory_edges. Every new-anchor
   INSERT has been failing with SQLite Error 19 since the refactor,
   blocking memory curation entirely - no new memories have been written
   since 2026-04-12 20:37 despite successful distillation proposals. Add
   scripts/repair-memory-schema.sql, a one-off rename/rebuild/copy repair
   that drops the legacy column from all four tables while preserving
   rows. FTS5 virtual tables are standalone (no content= mode) and do not
   need touching.

Also handle the AcceptedDistillationProposalsRecorded dead letter: the
observer correctly replies to LlmSessionActor, but the existing handler
only lived inside Passivating(). Add no-op Command<> registrations in
Ready(), Processing(), and Compacting() so the informational reply does
not land in DeadLetters during normal session states. Capture Sender into
a local in SessionMemoryObserverActor.HandleRecordAcceptedDistillationProposals
before calling Persist - the standard Akka.NET defense against Sender
being overwritten by an interleaved message before the persist callback
fires.
Aaronontheweb added a commit that referenced this pull request Apr 13, 2026
…NULL (#634)

* fix(sessions): recover from volatile-tail loop and legacy domain NOT NULL

Two independent regressions surfaced together in production Slack sessions
D0AC6CKBK5K/1776051715.090089 and D0AC6CKBK5K/1776075016.334849.

1. Tool-loop acknowledgement loop (PR #618 regression). SessionMessageAssembler
   emitted the volatile turn-context tail (memory recall, current time,
   working-context) as a ChatRole.User message at the end of the assembled
   list. During a tool loop, Qwen3's ChatML template read the trailing
   user-role block as a fresh user turn, so the model restarted its assistant
   response on every iteration, scanned back for the last real user content,
   and re-emitted "You're right - I had that backwards" before each tool call
   until context hit 262144/262144 and was force-compacted. Flip the role back
   to System; keep the placement at the end of the list so #618's cache
   stability win is preserved (llama.cpp KV cache is byte-level prefix
   matching, role tag at the end does not affect prefix stability). Update 4
   existing SessionMessageAssembler tests for the role flip and add a new
   Volatile_tail_does_not_create_fake_user_turn_after_tool_result regression
   test that builds a mid-tool-loop history and asserts the trailing message
   is System-role with no User-role after the last Tool/Assistant message.

2. Legacy memory_anchors.domain NOT NULL constraint. PR #588 removed the
   Domain concept from the in-code schema but intentionally shipped no
   migration. CREATE TABLE IF NOT EXISTS is a no-op on existing databases,
   so production DBs still had `domain TEXT NOT NULL` on memory_anchors,
   memory_documents, memory_records, and memory_edges. Every new-anchor
   INSERT has been failing with SQLite Error 19 since the refactor,
   blocking memory curation entirely - no new memories have been written
   since 2026-04-12 20:37 despite successful distillation proposals. Add
   scripts/repair-memory-schema.sql, a one-off rename/rebuild/copy repair
   that drops the legacy column from all four tables while preserving
   rows. FTS5 virtual tables are standalone (no content= mode) and do not
   need touching.

Also handle the AcceptedDistillationProposalsRecorded dead letter: the
observer correctly replies to LlmSessionActor, but the existing handler
only lived inside Passivating(). Add no-op Command<> registrations in
Ready(), Processing(), and Compacting() so the informational reply does
not land in DeadLetters during normal session states. Capture Sender into
a local in SessionMemoryObserverActor.HandleRecordAcceptedDistillationProposals
before calling Persist - the standard Akka.NET defense against Sender
being overwritten by an interleaved message before the persist callback
fires.

* refactor(sessions): collapse distillation ack no-op and trim test noise

Review cleanup on top of the prior volatile-tail and NOT-NULL fixes.

- LlmSessionActor: extract the three identical no-op
  `Command<AcceptedDistillationProposalsRecorded>(_ => { })` registrations
  from Ready/Processing/Compacting into a single `CommandDistillationAckNoOp()`
  helper with one canonical comment explaining why the non-passivation
  states need to swallow the observer's informational reply. Net: 3
  copies of the handler and 3 near-duplicate comment blocks collapse
  to 1 helper + 1 comment + 3 single-line call sites.

- SessionMessageAssemblerTests:
  * Recall_is_not_in_leading_system_prefix_even_when_resolved: drop the
    StringBuilder in favour of the per-message / break-on-non-System
    pattern already established by AssertNoVolatileContentInSystemPrefix.
    Same invariant, less allocation, same shape as the rest of the file.
  * Prefix_extends_through_history_when_startup_layers_settled,
    Volatile_tail_message_is_System_role_at_end_of_list, and
    Working_context_update_does_not_poison_system_prefix: trim WHAT-
    narrating comments that the identifiers and assertions already
    convey. Load-bearing WHY comments on the new regression test, the
    SessionMessageAssembler XML doc, and the SessionMemoryObserverActor
    Sender-capture rationale are preserved.

No behavioural change. Verified with
`dotnet test --filter "FullyQualifiedName~Sessions"` (253/253 passing)
and `dotnet slopwatch analyze` (0 issues).
Aaronontheweb added a commit that referenced this pull request May 1, 2026
Pre-OSS cleanup. The repair-memory-schema.sql script was a one-off fix
for databases created before PR #588 dropped the domain column. No longer
needed for the public release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory domain scoping is half-implemented: all sessions collapse to project:default

1 participant