feat(openclaw): improve extraction quality with noise filtering, deduplication, and better instructions#4302
Merged
whysosaket merged 13 commits intomainfrom Mar 18, 2026
Merged
Conversation
c04edab to
009f5cb
Compare
…plication, and better instructions - Add noise filtering pipeline (isNoiseMessage, stripNoiseFromContent, filterMessagesForExtraction) to drop cron heartbeats, acknowledgments, and system routing metadata before extraction - Add word-overlap deduplication (deduplicateByContent) for recalled memories to avoid redundant context injection - Rewrite DEFAULT_CUSTOM_INSTRUCTIONS with temporal anchoring, conciseness guidelines, outcome-over-intent extraction, and explicit exclusion rules - Expand agent_end message selection from last 10 to last 20 messages plus earlier summary messages - Add client-side threshold filtering and broad recall for short/new-session prompts in before_agent_start - Add pre-check for near-duplicate memories in memory_store tool - Add comprehensive unit tests for all new filtering and deduplication functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add isGenericAssistantMessage() to detect boilerplate assistant responses like "I see you've shared an update. How can I help?" that contain no extractable facts. Integrates into filterMessagesForExtraction to drop these before sending to mem0.add(), preventing the extraction model from wasting capacity on empty acknowledgments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract types, providers, config, filtering, and isolation into separate files for better maintainability. No behavioral changes — all 55 tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add message filtering and deduplication sections to README - Fix searchThreshold default to 0.5 in code, docs, and config - Add v0.3.1 changelog entry with all new features - Bump version from 0.3.0 to 0.3.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… triggers - Fix extractAgentId() to handle OpenClaw's subagent session key format (agent:main:subagent:<uuid>) so subagent memories go to isolated namespaces (utkarsh:agent:subagent-<uuid>) instead of the base userId - Add isNonInteractiveTrigger() to skip autocapture/autorecall for cron, heartbeat, automation, and schedule triggers — prevents system prompts from polluting the user's memory store - Fallback detection via session key patterns (:cron:, :heartbeat:) when ctx.trigger is not set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add user identity to extraction preamble so memories are attributed to the correct user instead of cross-referencing cached patterns (OPE-6 #1) - Skip mem0.add() when no user messages remain after noise filtering, avoiding wasted API calls on assistant-only payloads (OPE-6 #2) - Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and add dynamic thresholding that drops memories below 50% of the top result's score to reduce irrelevant context injection (OPE-6 #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ution The auto-recall injection now tells the agent whose memories are being provided, so it can correctly distinguish the current user from third parties mentioned in memories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…outing Subagents get ephemeral UUIDs (agent:main:subagent:<uuid>) that create empty, orphaned namespaces. This fix: - Adds isSubagentSession() to detect subagent session keys - Routes subagent recall to parent (main user) namespace so they get the user's long-term context instead of searching an empty namespace - Skips capture for subagents to prevent orphaned memories that are never read again (main agent captures consolidated output) - Adds subagent-specific preamble to prevent identity assumption Tested with 8 subagent spawns: 0 orphaned entities, 0 orphaned memories, 100% cross-session recall accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds changelog entries for all enhancements since 0.3.1: - Non-interactive trigger filtering (cron/heartbeat) - Subagent hallucination prevention (namespace routing) - User identity in recall/extraction preambles - Dynamic recall thresholding - User-content guard - 72 unit tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… changelog Ports the SQLite resilience fix (bfe730a) from main into the refactored module files: - types.ts: add disableHistory to oss config - providers.ts: init error recovery + retry with history disabled - index.ts: re-export mem0ConfigSchema and createProvider for tests - CHANGELOG.md: include SQLite resilience in 0.4.0 release notes All 82 tests passing (72 plugin + 10 SQLite resilience). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
009f5cb to
9175f14
Compare
…meter - Hooks now use ctx.sessionKey directly instead of shared mutable currentSessionId, preventing cross-session data leaks when multiple sessions run concurrently (e.g. multiple Telegram users) - Removed userId from memory_search, memory_store, memory_list tool parameters to prevent LLM prompt injection from accessing other users' namespaces. agentId is kept (safe — always namespaced) - Fixed tsconfig for modular imports (allowImportingTsExtensions) - Fixed providers.ts MemoryClient type for DTS generation - Updated README with subagent handling, concurrency safety, trigger filtering, security note, and disableHistory docs - Updated CHANGELOG with race condition fix and userId removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerate lockfile to match package.json dependency bump. Fixes CI frozen-lockfile failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…serId, relax prompts Per Saket's review feedback: 1. Remove client-side deduplication (deduplicateByContent) — mem0 handles dedup internally via its new algo 2. Restore userId tool parameter — not a security boundary since any user with org/project access can already see all memories 3. Relax extraction prompts — keep related facts together instead of forcing atomic 1-2 sentence memories, preserving context What's kept: noise filtering, trigger filtering, subagent isolation, temporal anchoring, dynamic thresholding, cold-start broadening, race condition fix, SQLite resilience. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
whysosaket
approved these changes
Mar 18, 2026
jamebobob
pushed a commit
to jamebobob/mem0-vigil-recall
that referenced
this pull request
Mar 29, 2026
…plication, and better instructions (mem0ai#4302) Co-authored-by: utkarsh240799 <utkarsh240799@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Problem
The OpenClaw mem0 plugin was suffering from poor extraction and recall quality in production. Analysis of real-world conversation data (~29K events) revealed compounding issues:
currentSessionIdvariable caused cross-session data leaks when multiple sessions ran concurrently.Solution
Noise filtering pipeline (
isNoiseMessage→isGenericAssistantMessage→stripNoiseFromContent→truncateMessage):Improved recall pipeline:
Rewritten custom extraction instructions:
Non-interactive trigger filtering (
isNonInteractiveTrigger):cron,heartbeat,automation,scheduletriggers:cron:,:heartbeat:)User identity in recall preamble:
cfg.userIdfor better attributionSubagent hallucination prevention (
isSubagentSession)::subagent:in session keysMulti-agent isolation:
extractAgentIdcorrectly parses subagent session keyseffectiveUserIdproduces isolated namespaces per named agentSession race condition fix:
ctx.sessionKeydirectly from the event context instead of shared mutablecurrentSessionIdcurrentSessionIdas best-effort fallback (they don't receive ctx)SQLite resilience for OSS mode:
oss.disableHistoryconfig optionUser-content guard: Skip extraction when no meaningful user content remains after filtering.
Expanded extraction window: from last 10 → last 20 messages + earlier summary messages.
Code refactor: Split monolithic 1772-line
index.tsinto 6 focused modules.Build pipeline: Fixed tsconfig for modular
.tsimports and typed MemoryClient opts for clean DTS generation.Type of change
How Has This Been Tested?
Unit Tests (78 tests passing)
isNoiseMessage: heartbeats, NO_REPLY, timestamps, single-word acks, system routing, post-compaction audit, real content passthroughisGenericAssistantMessage: boilerplate detection, long messages passthrough, substantive responses passthroughstripNoiseFromContent: embedded timestamps, system prefixes, routing lines, mixed content preservationfilterMessagesForExtraction: end-to-end pipeline, generic assistant dropping, assistant-only payload filteringextractAgentId: main agent, named agents, subagent session keyseffectiveUserId: main user, agent-scoped usersisNonInteractiveTrigger: cron, heartbeat, automation, schedule triggers; session key patterns (10 tests)isSubagentSession: subagent detection, main agent passthrough, named agent passthrough (4 tests)mem0ConfigSchema — disableHistory: config parsing for oss.disableHistory (4 tests)OSSProvider — disableHistory passthrough: Memory constructor receives flag (2 tests)OSSProvider — SQLite fallback: retry with history disabled on failure (3 tests)PlatformProvider — init recovery: initPromise reset on failure (1 test)Live E2E Tests — npm v0.3.3 vs Local v0.4.0 Comparison
Identical test inputs run on both plugin versions with clean mem0 platform between runs.
Test scenarios:
Comparison results:
Additional Test Rounds (prior to reviewer feedback)
Known Issues (Not Plugin Bugs)
$0.08→/bin/zsh.08) — upstream mem0 platform bugChecklist:
Maintainer Checklist