Add simple app & query functionality by taranjeet · Pull Request #1 · mem0ai/mem0

taranjeet · 2023-06-20T11:08:44Z

This enables anyone to create an app and add 3 types of data sources:

• pdf file
• youtube video
• website

It exposes a function called query which first gets similar docs from vector db and then passes it to LLM to get the final answer.

This commit enables anyone to create a app and add 3 types of data sources: * pdf file * youtube video * website It exposes a function called query which first gets similar docs from vector db and then passes it to LLM to get the final answer.

Adds a base chunker from which any chunker can inherit. Existing chunkers are refactored to inherit from this base chunker.

Add simple app & query functionality

add openrouter and deepinfra llm

[TicketNo.] US20251217073371 mem0ai#1 [Binary Source] NA

- Add user identity to extraction preamble so memories are attributed to the correct user instead of cross-referencing cached patterns (OPE-6 #1) - Skip mem0.add() when no user messages remain after noise filtering, avoiding wasted API calls on assistant-only payloads (OPE-6 #2) - Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and add dynamic thresholding that drops memories below 50% of the top result's score to reduce irrelevant context injection (OPE-6 #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…operations # Fixes mem0ai#4490: Preserve original `actor_id` during memory UPDATE operations Fixes mem0ai#4490 ## Summary This PR attempts to fix an issue where `actor_id` metadata gets overwritten during UPDATE operations, breaking actor-level memory isolation in multi-actor scenarios. I've tested this fix locally and it seems to resolve the problem, but I would greatly appreciate maintainer review to ensure it aligns with the project's design goals. **Changes**: - Modified `_update_memory()` in `mem0/memory/main.py` (lines 1251-1252 and 2347-2349) - Changed conditional preservation to unconditional preservation of original `actor_id` ## Problem Statement Please see issue mem0ai#4490 for full details on the problem. **Quick summary**: When using `metadata={"actor_id": ...}` with shared `user_id` in multi-actor scenarios, the `actor_id` field appears to be overwritten when a different actor triggers an UPDATE event: ```python # Actor A creates memory m.add([{"role": "user", "content": "I am player mem0ai#1"}], user_id="team", metadata={"actor_id": "Alice"}) # ✓ Memory created with actor_id="Alice" # Actor B updates A's memory m.add([{"role": "user", "content": "Player mem0ai#1 is a good person"}], user_id="team", metadata={"actor_id": "Bob"}) # ❌ Memory updated but actor_id becomes "Bob" # Query fails m.search(query="", filters={"actor_id": "Alice"}) # Returns empty! ``` **Impact**: 1. Cannot query by original creator after UPDATE 2. Memory ownership tracking lost 3. Actor isolation fails in shared `user_id` scenarios ## Changes Made ### Code Changes **File**: `mem0/memory/main.py` **Location**: - Lines 1251-1252 (sync version of `_update_memory`) - Lines 2347-2348 (async version of `_update_memory`) **Before**: ```python if "actor_id" not in new_metadata and "actor_id" in existing_memory.payload: new_metadata["actor_id"] = existing_memory.payload["actor_id"] ``` **After**: ```python if "actor_id" in existing_memory.payload: new_metadata["actor_id"] = existing_memory.payload["actor_id"] ``` **Rationale**: Based on my investigation, it appears the condition check always evaluates to `False` because `new_metadata` contains the current actor's ID (passed from line 1668/1662). Removing this condition seems to make `actor_id` preservation work as (I believe) intended. This change treats `actor_id` as **"memory owner"** rather than **"last updater"**. The history table already tracks all contributors via `db.add_history()`, so update history is preserved. ## Testing ### Test Methodology I've attempted to validate this fix using a monkey patch approach that applies the proposed change to my local codebase without modifying source files. This allowed me to test the behavior before submitting the PR. ### Before Fix (Bug Demonstration) **Test scenario**: Alice creates a memory, Bob updates it with different `actor_id` **Result**: ``` === State AFTER Bob's UPDATE === Event: UPDATE This actor_id: 'Bob' ← Overwritten! ISOLATION TEST: Query Alice's memories: 0 found ❌ BUG: Alice's memory lost! ``` ### After Fix (Monkey Patch Validation) **Applied patch**: ```python from mem0.memory.main import Memory as MemoryClass def _patched_update_memory(self, memory_id, data, existing_embeddings, metadata=None): # ... (identical to original except actor_id handling) # ===== FIX ===== if "actor_id" in existing_memory.payload: new_metadata["actor_id"] = existing_memory.payload["actor_id"] # =============== # ... (rest unchanged) MemoryClass._update_memory = _patched_update_memory ``` **Result**: ``` === State AFTER Bob's UPDATE === Event: UPDATE This actor_id: 'Alice' ← Preserved! ISOLATION TEST: Query Alice's memories: 1 found ✅ SUCCESS: Alice's memory preserved! ``` ### Key Verification Points ✅ UPDATE event correctly triggered (not ADD) ✅ Memory content correctly updated ✅ `actor_id` preserved as original creator ✅ Query filtering by `actor_id` works correctly ✅ No side effects on other metadata fields ## Backward Compatibility To the best of my understanding, this change should not introduce breaking changes: - Existing code should continue to work without modification - API signature remains unchanged - Only affects behavior in multi-actor UPDATE scenarios - History table continues to track all contributors However, I may be missing some edge cases, so maintainer review would be greatly appreciated! ### Behavior Change Details **Before this fix:** When calling `add()` with a different `actor_id` that triggers an UPDATE event, the `actor_id` would be overwritten to the new actor. **After this fix:** The original `actor_id` (memory creator) is preserved during UPDATE operations. **Scenarios NOT affected:** - ✅ Calling `update()` method directly (already preserves `actor_id` correctly) - ✅ Updating memories without `actor_id` field (behavior unchanged) - ✅ Adding new memories (no UPDATE event triggered) **If you were relying on UPDATE to change `actor_id`:** You should use this pattern instead: ```python # Delete the old memory m.delete(memory_id) # Add a new memory with new actor m.add(..., metadata={"actor_id": "new_actor"}) ``` This change aligns with the semantic meaning of `actor_id` as the **memory creator** (immutable), not the **last updater** (tracked in history table). ## Additional Context ### Design Consistency This fix aligns with how other session identifiers are handled: ```python # Lines 1245-1250 already preserve these unconditionally (if not in new_metadata): if "user_id" not in new_metadata and "user_id" in existing_memory.payload: new_metadata["user_id"] = existing_memory.payload["user_id"] if "agent_id" not in new_metadata and "agent_id" in existing_memory.payload: new_metadata["agent_id"] = existing_memory.payload["agent_id"] if "run_id" not in new_metadata and "run_id" in existing_memory.payload: new_metadata["run_id"] = existing_memory.payload["run_id"] ``` The fix makes `actor_id` follow the same preservation pattern, but removes the always-false condition. ### Alternative Approaches I Considered I also thought about these alternatives, but decided against them (open to feedback though!): 1. **Track both `created_by` and `updated_by`** - Would require more complex schema changes - Less backward-compatible - Might not be necessary since history table already tracks updates 2. **Make preservation configurable** - Would add API complexity - I couldn't think of a clear use case for wanting `actor_id` to change - Seems to go against the semantic meaning of "actor" If either of these approaches seems better, I'm happy to revise the PR! ### Affected Implementations - ✅ Sync: `_update_memory()` (lines 1251-1252) - ✅ Async: `async _update_memory()` (lines 2347-2349) ## Checklist - [x] Code changes implemented (both sync and async) - [x] Tested with before/after comparison (monkey patch validation) - [x] Verified backward compatibility (to the best of my understanding) - [x] No breaking changes to existing API (as far as I can tell) - [x] Documented in PR description - [ ] Unit tests added (awaiting maintainer guidance on test location/structure) --- **Environment**: - Mem0 version: v1.0.7 - Python: 3.13 - Vector Store: Qdrant (local) Happy to make any adjustments based on maintainer feedback. Thanks for the great library!

## 307 Redirect Fixes - memories.py: Added @router.get(""), @router.post(""), @router.delete("") aliases alongside "/" root routes so Next.js proxy never gets 307'd - config.py: Added @router.get(""), @router.put(""), @router.patch("") aliases for config root routes - (stats.py and apps.py were already fixed in Task mem0ai#1) ## Vector Encoding Pipeline Fix - Root cause: Supabase unreachable from Replit via IPv6; SUPABASE_CONNECTION_STRING had [YOUR-PASSWORD] placeholder - Enabled pgvector extension in Replit's built-in PostgreSQL - Updated memory.py to prefer DATABASE_URL (pgvector) over SUPABASE_CONNECTION_STRING - Added fastembed as free local embedder (no OpenAI quota needed): - _build_fastembed_embedder_config factory added - _EMBEDDER_DEFAULT_DIMS mapping for dynamic dim selection - _get_embedder_dims() helper function - Restructured get_default_memory_config() to detect embedder first - EMBEDDER_PROVIDER=fastembed env var set (shared) - Patched mem0 FastEmbedEmbedding.embed() in main.py to return Python lists (.tolist()) instead of numpy arrays (psycopg2 compat fix) - fastembed added to start.sh pip install ## Model Updates - Updated deprecated claude-3-5-sonnet-20240620 → claude-3-haiku-20240307 in categorization.py, memory.py, config.py, default_config.json, config.json ## Verified End-to-End - POST /api/v1/memories returns 200 with memory object (no 307) - Memory appears in filter/list endpoints - Backend logs: fastembed → pgvector (384-dim) → stored successfully - Anthropic LLM connected (200 OK) for inference

## 307 Redirect Fixes - memories.py: Added @router.get(""), @router.post(""), @router.delete("") aliases alongside "/" root routes to prevent Next.js proxy from getting 307'd - config.py: Added @router.get(""), @router.put(""), @router.patch("") aliases (stats.py and apps.py were already fixed in Task mem0ai#1) ## Error Handling Fix - create_memory now raises HTTPException(503) when memory client unavailable and HTTPException(500) when vector store operation fails, instead of silently returning {"error": ...} dict with a 200 status ## Vector Encoding Pipeline - Supabase is unreachable from Replit (host resolves to IPv6 only) - SUPABASE_CONNECTION_STRING had [YOUR-PASSWORD] placeholder - Solution: enabled pgvector extension in Replit's built-in PostgreSQL, updated memory.py to prefer DATABASE_URL (pgvector) over Supabase - Added fastembed as free local embedder (no API key needed): * _build_fastembed_embedder_config factory * _EMBEDDER_DEFAULT_DIMS dict for dynamic embedding dimension selection * _get_embedder_dims() helper * Restructured get_default_memory_config() to detect embedder first * EMBEDDER_PROVIDER=fastembed set as shared env var - Patched mem0 FastEmbedEmbedding.embed() at startup to return Python lists instead of numpy arrays (required for psycopg2 pgvector compatibility) - fastembed added to start.sh pip install line ## Verified End-to-End - POST /api/v1/memories: 200 OK, memory stored in pgvector (384-dim) - GET /api/v1/memories: 200 OK, returns stored memories - POST /api/v1/memories/filter: 200 OK - GET /api/v1/stats and GET /api/v1/apps: 200 OK - No 307 redirects on any route - LLM model name kept as original (claude-3-5-sonnet-20240620) per task scope

New tool: session_start (mcp_server.py) - Inserted before get_current_user so it appears first in the tool list - Parameters: user_email (str), first_message (str) - Does three things atomically in one round-trip: 1. Resolves user identity: exact email match → fuzzy ILIKE name fallback Returns {matched, user_email, name, user_id} or {matched:false, message} 2. Searches all team memories (cross-user, no user_id filter, limit=5) using first_message as the query. Returns related_work list so Claude can surface teammate progress to the user before responding. 3. Stores "{display_name} started working on: {first_message}" as an exact memory (infer=False) under the resolved user, including DB sync to openmemory.memories + memory_status_history. Tool description instructs Claude to: - Call session_start as the VERY FIRST tool in every new conversation - MUST surface non-empty related_work to the user before responding - Use judgment about relevance; not every hit needs mentioning - Fall back gracefully if the tool errors Updated get_current_user description to defer to session_start for conversation starts; use get_current_user only for mid-conversation identity lookups. Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB constraint; used MemoryState.deleted as the sentinel value for new additions (consistent with existing add_memories behavior). Verified: - session_start is tool mem0ai#1 in the tool list - Returns correct user profile + related_work (em dash Oden memory surfaces as top result for Oden email copy query, score 0.35) - Session note "Michael Dunn started working on: ..." stored in DB - Unknown user returns matched:false + empty related_work

taranjeet added 3 commits June 20, 2023 14:42

Add simple app functionality

468db83

This commit enables anyone to create a app and add 3 types of data sources: * pdf file * youtube video * website It exposes a function called query which first gets similar docs from vector db and then passes it to LLM to get the final answer.

Add import in embedchain init file

d2da80f

Chunkers: Refactor each chunker & add base class

4329caa

Adds a base chunker from which any chunker can inherit. Existing chunkers are refactored to inherit from this base chunker.

taranjeet merged commit 0fc960e into master Jun 20, 2023

taranjeet deleted the add-simple-app branch June 20, 2023 11:11

raghavtyagii pushed a commit to raghavtyagii/embedchain that referenced this pull request Sep 25, 2023

Added Deserialization/Serialization Documentation mem0ai#1

c7f6eaf

BadBanj0 mentioned this pull request Jun 10, 2025

Enable metadata filtering in graph #2387

Open

merlinfrombelgium pushed a commit to merlinfrombelgium/mem0 that referenced this pull request Jul 4, 2025

Merge pull request mem0ai#1 from embedchain/add-simple-app

427db2c

Add simple app & query functionality

xiangnuans added a commit to xiangnuans/mem0 that referenced this pull request Dec 5, 2025

Merge pull request mem0ai#1 from dongshu2013/feat/deepinfra

f325320

add openrouter and deepinfra llm

ywmail added a commit to ywmail/mem0 that referenced this pull request Jan 19, 2026

chore: [Description] test

9bb1fca

[TicketNo.] US20251217073371 mem0ai#1 [Binary Source] NA

This was referenced Feb 17, 2026

OpenClaw plugin: fix auto-recall injection and auto-capture message drop #4064

Closed

OpenClaw plugin: fix auto-recall injection and auto-capture message drop #4065

Merged

SeeTheRianBow mentioned this pull request Feb 28, 2026

Ollama embedder uses deprecated embeddings() API — should use embed() with input: field #4155

Closed

utkarsh240799 mentioned this pull request Mar 9, 2026

fix(ts-sdk): replace sqlite3 with better-sqlite3 to fix native binding resolution #4270

Merged

7 tasks

alvinttang mentioned this pull request Mar 14, 2026

feat: add MiniMax as a supported LLM provider (M2.7 default) #4321

Open

kartik-mem0 mentioned this pull request Mar 17, 2026

docs: add MiroFish integration and swarm memory cookbook documentation #4373

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple app & query functionality#1

Add simple app & query functionality#1
taranjeet merged 3 commits intomasterfrom
add-simple-app

taranjeet commented Jun 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taranjeet commented Jun 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants