Conversation
This commit enables anyone to create a app and add 3 types of data sources: * pdf file * youtube video * website It exposes a function called query which first gets similar docs from vector db and then passes it to LLM to get the final answer.
Adds a base chunker from which any chunker can inherit. Existing chunkers are refactored to inherit from this base chunker.
raghavtyagii
pushed a commit
to raghavtyagii/embedchain
that referenced
this pull request
Sep 25, 2023
merlinfrombelgium
pushed a commit
to merlinfrombelgium/mem0
that referenced
this pull request
Jul 4, 2025
Add simple app & query functionality
xiangnuans
added a commit
to xiangnuans/mem0
that referenced
this pull request
Dec 5, 2025
add openrouter and deepinfra llm
ywmail
added a commit
to ywmail/mem0
that referenced
this pull request
Jan 19, 2026
[TicketNo.] US20251217073371 mem0ai#1 [Binary Source] NA
This was referenced Feb 17, 2026
7 tasks
utkarsh240799
added a commit
that referenced
this pull request
Mar 13, 2026
- Add user identity to extraction preamble so memories are attributed to the correct user instead of cross-referencing cached patterns (OPE-6 #1) - Skip mem0.add() when no user messages remain after noise filtering, avoiding wasted API calls on assistant-only payloads (OPE-6 #2) - Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and add dynamic thresholding that drops memories below 50% of the top result's score to reduce irrelevant context injection (OPE-6 #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
utkarsh240799
added a commit
that referenced
this pull request
Mar 16, 2026
- Add user identity to extraction preamble so memories are attributed to the correct user instead of cross-referencing cached patterns (OPE-6 #1) - Skip mem0.add() when no user messages remain after noise filtering, avoiding wasted API calls on assistant-only payloads (OPE-6 #2) - Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and add dynamic thresholding that drops memories below 50% of the top result's score to reduce irrelevant context injection (OPE-6 #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
utkarsh240799
added a commit
that referenced
this pull request
Mar 16, 2026
- Add user identity to extraction preamble so memories are attributed to the correct user instead of cross-referencing cached patterns (OPE-6 #1) - Skip mem0.add() when no user messages remain after noise filtering, avoiding wasted API calls on assistant-only payloads (OPE-6 #2) - Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and add dynamic thresholding that drops memories below 50% of the top result's score to reduce irrelevant context injection (OPE-6 #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VictorECDSA
added a commit
to VictorECDSA/mem0
that referenced
this pull request
Mar 23, 2026
…operations # Fixes mem0ai#4490: Preserve original `actor_id` during memory UPDATE operations Fixes mem0ai#4490 ## Summary This PR attempts to fix an issue where `actor_id` metadata gets overwritten during UPDATE operations, breaking actor-level memory isolation in multi-actor scenarios. I've tested this fix locally and it seems to resolve the problem, but I would greatly appreciate maintainer review to ensure it aligns with the project's design goals. **Changes**: - Modified `_update_memory()` in `mem0/memory/main.py` (lines 1251-1252 and 2347-2349) - Changed conditional preservation to unconditional preservation of original `actor_id` ## Problem Statement Please see issue mem0ai#4490 for full details on the problem. **Quick summary**: When using `metadata={"actor_id": ...}` with shared `user_id` in multi-actor scenarios, the `actor_id` field appears to be overwritten when a different actor triggers an UPDATE event: ```python # Actor A creates memory m.add([{"role": "user", "content": "I am player mem0ai#1"}], user_id="team", metadata={"actor_id": "Alice"}) # ✓ Memory created with actor_id="Alice" # Actor B updates A's memory m.add([{"role": "user", "content": "Player mem0ai#1 is a good person"}], user_id="team", metadata={"actor_id": "Bob"}) # ❌ Memory updated but actor_id becomes "Bob" # Query fails m.search(query="", filters={"actor_id": "Alice"}) # Returns empty! ``` **Impact**: 1. Cannot query by original creator after UPDATE 2. Memory ownership tracking lost 3. Actor isolation fails in shared `user_id` scenarios ## Changes Made ### Code Changes **File**: `mem0/memory/main.py` **Location**: - Lines 1251-1252 (sync version of `_update_memory`) - Lines 2347-2348 (async version of `_update_memory`) **Before**: ```python if "actor_id" not in new_metadata and "actor_id" in existing_memory.payload: new_metadata["actor_id"] = existing_memory.payload["actor_id"] ``` **After**: ```python if "actor_id" in existing_memory.payload: new_metadata["actor_id"] = existing_memory.payload["actor_id"] ``` **Rationale**: Based on my investigation, it appears the condition check always evaluates to `False` because `new_metadata` contains the current actor's ID (passed from line 1668/1662). Removing this condition seems to make `actor_id` preservation work as (I believe) intended. This change treats `actor_id` as **"memory owner"** rather than **"last updater"**. The history table already tracks all contributors via `db.add_history()`, so update history is preserved. ## Testing ### Test Methodology I've attempted to validate this fix using a monkey patch approach that applies the proposed change to my local codebase without modifying source files. This allowed me to test the behavior before submitting the PR. ### Before Fix (Bug Demonstration) **Test scenario**: Alice creates a memory, Bob updates it with different `actor_id` **Result**: ``` === State AFTER Bob's UPDATE === Event: UPDATE This actor_id: 'Bob' ← Overwritten! ISOLATION TEST: Query Alice's memories: 0 found ❌ BUG: Alice's memory lost! ``` ### After Fix (Monkey Patch Validation) **Applied patch**: ```python from mem0.memory.main import Memory as MemoryClass def _patched_update_memory(self, memory_id, data, existing_embeddings, metadata=None): # ... (identical to original except actor_id handling) # ===== FIX ===== if "actor_id" in existing_memory.payload: new_metadata["actor_id"] = existing_memory.payload["actor_id"] # =============== # ... (rest unchanged) MemoryClass._update_memory = _patched_update_memory ``` **Result**: ``` === State AFTER Bob's UPDATE === Event: UPDATE This actor_id: 'Alice' ← Preserved! ISOLATION TEST: Query Alice's memories: 1 found ✅ SUCCESS: Alice's memory preserved! ``` ### Key Verification Points ✅ UPDATE event correctly triggered (not ADD) ✅ Memory content correctly updated ✅ `actor_id` preserved as original creator ✅ Query filtering by `actor_id` works correctly ✅ No side effects on other metadata fields ## Backward Compatibility To the best of my understanding, this change should not introduce breaking changes: - Existing code should continue to work without modification - API signature remains unchanged - Only affects behavior in multi-actor UPDATE scenarios - History table continues to track all contributors However, I may be missing some edge cases, so maintainer review would be greatly appreciated! ### Behavior Change Details **Before this fix:** When calling `add()` with a different `actor_id` that triggers an UPDATE event, the `actor_id` would be overwritten to the new actor. **After this fix:** The original `actor_id` (memory creator) is preserved during UPDATE operations. **Scenarios NOT affected:** - ✅ Calling `update()` method directly (already preserves `actor_id` correctly) - ✅ Updating memories without `actor_id` field (behavior unchanged) - ✅ Adding new memories (no UPDATE event triggered) **If you were relying on UPDATE to change `actor_id`:** You should use this pattern instead: ```python # Delete the old memory m.delete(memory_id) # Add a new memory with new actor m.add(..., metadata={"actor_id": "new_actor"}) ``` This change aligns with the semantic meaning of `actor_id` as the **memory creator** (immutable), not the **last updater** (tracked in history table). ## Additional Context ### Design Consistency This fix aligns with how other session identifiers are handled: ```python # Lines 1245-1250 already preserve these unconditionally (if not in new_metadata): if "user_id" not in new_metadata and "user_id" in existing_memory.payload: new_metadata["user_id"] = existing_memory.payload["user_id"] if "agent_id" not in new_metadata and "agent_id" in existing_memory.payload: new_metadata["agent_id"] = existing_memory.payload["agent_id"] if "run_id" not in new_metadata and "run_id" in existing_memory.payload: new_metadata["run_id"] = existing_memory.payload["run_id"] ``` The fix makes `actor_id` follow the same preservation pattern, but removes the always-false condition. ### Alternative Approaches I Considered I also thought about these alternatives, but decided against them (open to feedback though!): 1. **Track both `created_by` and `updated_by`** - Would require more complex schema changes - Less backward-compatible - Might not be necessary since history table already tracks updates 2. **Make preservation configurable** - Would add API complexity - I couldn't think of a clear use case for wanting `actor_id` to change - Seems to go against the semantic meaning of "actor" If either of these approaches seems better, I'm happy to revise the PR! ### Affected Implementations - ✅ Sync: `_update_memory()` (lines 1251-1252) - ✅ Async: `async _update_memory()` (lines 2347-2349) ## Checklist - [x] Code changes implemented (both sync and async) - [x] Tested with before/after comparison (monkey patch validation) - [x] Verified backward compatibility (to the best of my understanding) - [x] No breaking changes to existing API (as far as I can tell) - [x] Documented in PR description - [ ] Unit tests added (awaiting maintainer guidance on test location/structure) --- **Environment**: - Mem0 version: v1.0.7 - Python: 3.13 - Vector Store: Qdrant (local) Happy to make any adjustments based on maintainer feedback. Thanks for the great library!
MDGreyMatter
pushed a commit
to Go-Grey-Matter/ai-memory-system
that referenced
this pull request
Mar 31, 2026
## 307 Redirect Fixes
- memories.py: Added @router.get(""), @router.post(""), @router.delete("")
aliases alongside "/" root routes so Next.js proxy never gets 307'd
- config.py: Added @router.get(""), @router.put(""), @router.patch("")
aliases for config root routes
- (stats.py and apps.py were already fixed in Task mem0ai#1)
## Vector Encoding Pipeline Fix
- Root cause: Supabase unreachable from Replit via IPv6; SUPABASE_CONNECTION_STRING
had [YOUR-PASSWORD] placeholder
- Enabled pgvector extension in Replit's built-in PostgreSQL
- Updated memory.py to prefer DATABASE_URL (pgvector) over SUPABASE_CONNECTION_STRING
- Added fastembed as free local embedder (no OpenAI quota needed):
- _build_fastembed_embedder_config factory added
- _EMBEDDER_DEFAULT_DIMS mapping for dynamic dim selection
- _get_embedder_dims() helper function
- Restructured get_default_memory_config() to detect embedder first
- EMBEDDER_PROVIDER=fastembed env var set (shared)
- Patched mem0 FastEmbedEmbedding.embed() in main.py to return Python
lists (.tolist()) instead of numpy arrays (psycopg2 compat fix)
- fastembed added to start.sh pip install
## Model Updates
- Updated deprecated claude-3-5-sonnet-20240620 → claude-3-haiku-20240307
in categorization.py, memory.py, config.py, default_config.json, config.json
## Verified End-to-End
- POST /api/v1/memories returns 200 with memory object (no 307)
- Memory appears in filter/list endpoints
- Backend logs: fastembed → pgvector (384-dim) → stored successfully
- Anthropic LLM connected (200 OK) for inference
MDGreyMatter
pushed a commit
to Go-Grey-Matter/ai-memory-system
that referenced
this pull request
Mar 31, 2026
## 307 Redirect Fixes
- memories.py: Added @router.get(""), @router.post(""), @router.delete("")
aliases alongside "/" root routes to prevent Next.js proxy from getting 307'd
- config.py: Added @router.get(""), @router.put(""), @router.patch("") aliases
(stats.py and apps.py were already fixed in Task mem0ai#1)
## Error Handling Fix
- create_memory now raises HTTPException(503) when memory client unavailable
and HTTPException(500) when vector store operation fails, instead of silently
returning {"error": ...} dict with a 200 status
## Vector Encoding Pipeline
- Supabase is unreachable from Replit (host resolves to IPv6 only)
- SUPABASE_CONNECTION_STRING had [YOUR-PASSWORD] placeholder
- Solution: enabled pgvector extension in Replit's built-in PostgreSQL,
updated memory.py to prefer DATABASE_URL (pgvector) over Supabase
- Added fastembed as free local embedder (no API key needed):
* _build_fastembed_embedder_config factory
* _EMBEDDER_DEFAULT_DIMS dict for dynamic embedding dimension selection
* _get_embedder_dims() helper
* Restructured get_default_memory_config() to detect embedder first
* EMBEDDER_PROVIDER=fastembed set as shared env var
- Patched mem0 FastEmbedEmbedding.embed() at startup to return Python lists
instead of numpy arrays (required for psycopg2 pgvector compatibility)
- fastembed added to start.sh pip install line
## Verified End-to-End
- POST /api/v1/memories: 200 OK, memory stored in pgvector (384-dim)
- GET /api/v1/memories: 200 OK, returns stored memories
- POST /api/v1/memories/filter: 200 OK
- GET /api/v1/stats and GET /api/v1/apps: 200 OK
- No 307 redirects on any route
- LLM model name kept as original (claude-3-5-sonnet-20240620) per task scope
MDGreyMatter
pushed a commit
to Go-Grey-Matter/ai-memory-system
that referenced
this pull request
Mar 31, 2026
New tool: session_start (mcp_server.py)
- Inserted before get_current_user so it appears first in the tool list
- Parameters: user_email (str), first_message (str)
- Does three things atomically in one round-trip:
1. Resolves user identity: exact email match → fuzzy ILIKE name fallback
Returns {matched, user_email, name, user_id} or {matched:false, message}
2. Searches all team memories (cross-user, no user_id filter, limit=5)
using first_message as the query. Returns related_work list so Claude
can surface teammate progress to the user before responding.
3. Stores "{display_name} started working on: {first_message}" as an
exact memory (infer=False) under the resolved user, including DB sync
to openmemory.memories + memory_status_history.
Tool description instructs Claude to:
- Call session_start as the VERY FIRST tool in every new conversation
- MUST surface non-empty related_work to the user before responding
- Use judgment about relevance; not every hit needs mentioning
- Fall back gracefully if the tool errors
Updated get_current_user description to defer to session_start for
conversation starts; use get_current_user only for mid-conversation
identity lookups.
Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB
constraint; used MemoryState.deleted as the sentinel value for new
additions (consistent with existing add_memories behavior).
Verified:
- session_start is tool mem0ai#1 in the tool list
- Returns correct user profile + related_work (em dash Oden memory
surfaces as top result for Oden email copy query, score 0.35)
- Session note "Michael Dunn started working on: ..." stored in DB
- Unknown user returns matched:false + empty related_work
MDGreyMatter
pushed a commit
to Go-Grey-Matter/ai-memory-system
that referenced
this pull request
Mar 31, 2026
New tool: session_start (mcp_server.py)
- Inserted before get_current_user so it appears first in the tool list
- Parameters: user_email (str), first_message (str)
- Does three things atomically in one round-trip:
1. Resolves user identity: exact email match → fuzzy ILIKE name fallback
Returns {matched, user_email, name, user_id} or {matched:false, message}
2. Searches all team memories (cross-user, no user_id filter, limit=5)
using first_message as the query. Returns related_work list so Claude
can surface teammate progress to the user before responding.
3. Stores "{display_name} started working on: {first_message}" as an
exact memory (infer=False) under the resolved user, including DB sync
to openmemory.memories + memory_status_history.
Tool description instructs Claude to:
- Call session_start as the VERY FIRST tool in every new conversation
- MUST surface non-empty related_work to the user before responding
- Use judgment about relevance; not every hit needs mentioning
- Fall back gracefully if the tool errors
Updated get_current_user description to defer to session_start for
conversation starts; use get_current_user only for mid-conversation
identity lookups.
Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB
constraint; used MemoryState.deleted as the sentinel value for new
additions (consistent with existing add_memories behavior).
Verified:
- session_start is tool mem0ai#1 in the tool list
- Returns correct user profile + related_work (em dash Oden memory
surfaces as top result for Oden email copy query, score 0.35)
- Session note "Michael Dunn started working on: ..." stored in DB
- Unknown user returns matched:false + empty related_work
MDGreyMatter
pushed a commit
to Go-Grey-Matter/ai-memory-system
that referenced
this pull request
Mar 31, 2026
New tool: session_start (mcp_server.py)
- Inserted before get_current_user so it appears first in the tool list
- Parameters: user_email (str), first_message (str)
- Does three things atomically in one round-trip:
1. Resolves user identity: exact email match → fuzzy ILIKE name fallback
Returns {matched, user_email, name, user_id} or {matched:false, message}
2. Searches all team memories (cross-user, no user_id filter, limit=5)
using first_message as the query. Returns related_work list so Claude
can surface teammate progress to the user before responding.
3. Stores "{display_name} started working on: {first_message}" as an
exact memory (infer=False) under the resolved user, including DB sync
to openmemory.memories + memory_status_history.
Tool description instructs Claude to:
- Call session_start as the VERY FIRST tool in every new conversation
- MUST surface non-empty related_work to the user before responding
- Use judgment about relevance; not every hit needs mentioning
- Fall back gracefully if the tool errors
Updated get_current_user description to defer to session_start for
conversation starts; use get_current_user only for mid-conversation
identity lookups.
Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB
constraint; used MemoryState.deleted as the sentinel value for new
additions (consistent with existing add_memories behavior).
Verified:
- session_start is tool mem0ai#1 in the tool list
- Returns correct user profile + related_work (em dash Oden memory
surfaces as top result for Oden email copy query, score 0.35)
- Session note "Michael Dunn started working on: ..." stored in DB
- Unknown user returns matched:false + empty related_work
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This enables anyone to create an app and add 3 types of data sources:
• pdf file
• youtube video
• website
It exposes a function called query which first gets similar docs from vector db and then passes it to LLM to get the final answer.