Skip to content

Add simple app & query functionality#1

Merged
taranjeet merged 3 commits intomasterfrom
add-simple-app
Jun 20, 2023
Merged

Add simple app & query functionality#1
taranjeet merged 3 commits intomasterfrom
add-simple-app

Conversation

@taranjeet
Copy link
Copy Markdown
Member

This enables anyone to create an app and add 3 types of data sources:

• pdf file
• youtube video
• website

It exposes a function called query which first gets similar docs from vector db and then passes it to LLM to get the final answer.

This commit enables anyone to create a app and add 3 types of data
sources:

* pdf file
* youtube video
* website

It exposes a function called query which first gets similar docs from
vector db and then passes it to LLM to get the final answer.
Adds a base chunker from which any chunker can inherit.
Existing chunkers are refactored to inherit from this base
chunker.
@taranjeet taranjeet merged commit 0fc960e into master Jun 20, 2023
@taranjeet taranjeet deleted the add-simple-app branch June 20, 2023 11:11
raghavtyagii pushed a commit to raghavtyagii/embedchain that referenced this pull request Sep 25, 2023
merlinfrombelgium pushed a commit to merlinfrombelgium/mem0 that referenced this pull request Jul 4, 2025
Add simple app & query functionality
xiangnuans added a commit to xiangnuans/mem0 that referenced this pull request Dec 5, 2025
ywmail added a commit to ywmail/mem0 that referenced this pull request Jan 19, 2026
[TicketNo.] US20251217073371 mem0ai#1
[Binary Source] NA
utkarsh240799 added a commit that referenced this pull request Mar 13, 2026
- Add user identity to extraction preamble so memories are attributed to
  the correct user instead of cross-referencing cached patterns (OPE-6 #1)
- Skip mem0.add() when no user messages remain after noise filtering,
  avoiding wasted API calls on assistant-only payloads (OPE-6 #2)
- Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and
  add dynamic thresholding that drops memories below 50% of the top
  result's score to reduce irrelevant context injection (OPE-6 #3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
utkarsh240799 added a commit that referenced this pull request Mar 16, 2026
- Add user identity to extraction preamble so memories are attributed to
  the correct user instead of cross-referencing cached patterns (OPE-6 #1)
- Skip mem0.add() when no user messages remain after noise filtering,
  avoiding wasted API calls on assistant-only payloads (OPE-6 #2)
- Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and
  add dynamic thresholding that drops memories below 50% of the top
  result's score to reduce irrelevant context injection (OPE-6 #3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
utkarsh240799 added a commit that referenced this pull request Mar 16, 2026
- Add user identity to extraction preamble so memories are attributed to
  the correct user instead of cross-referencing cached patterns (OPE-6 #1)
- Skip mem0.add() when no user messages remain after noise filtering,
  avoiding wasted API calls on assistant-only payloads (OPE-6 #2)
- Raise auto-recall threshold to 0.6 (vs 0.5 for explicit search) and
  add dynamic thresholding that drops memories below 50% of the top
  result's score to reduce irrelevant context injection (OPE-6 #3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VictorECDSA added a commit to VictorECDSA/mem0 that referenced this pull request Mar 23, 2026
…operations

# Fixes mem0ai#4490: Preserve original `actor_id` during memory UPDATE operations

Fixes mem0ai#4490

## Summary

This PR attempts to fix an issue where `actor_id` metadata gets overwritten during UPDATE operations, breaking actor-level memory isolation in multi-actor scenarios.

I've tested this fix locally and it seems to resolve the problem, but I would greatly appreciate maintainer review to ensure it aligns with the project's design goals.

**Changes**:
- Modified `_update_memory()` in `mem0/memory/main.py` (lines 1251-1252 and 2347-2349)
- Changed conditional preservation to unconditional preservation of original `actor_id`

## Problem Statement

Please see issue mem0ai#4490 for full details on the problem.

**Quick summary**: When using `metadata={"actor_id": ...}` with shared `user_id` in multi-actor scenarios, the `actor_id` field appears to be overwritten when a different actor triggers an UPDATE event:

```python
# Actor A creates memory
m.add([{"role": "user", "content": "I am player mem0ai#1"}], 
      user_id="team", metadata={"actor_id": "Alice"})
# ✓ Memory created with actor_id="Alice"

# Actor B updates A's memory  
m.add([{"role": "user", "content": "Player mem0ai#1 is a good person"}],
      user_id="team", metadata={"actor_id": "Bob"})
# ❌ Memory updated but actor_id becomes "Bob"

# Query fails
m.search(query="", filters={"actor_id": "Alice"})  # Returns empty!
```

**Impact**:
1. Cannot query by original creator after UPDATE
2. Memory ownership tracking lost
3. Actor isolation fails in shared `user_id` scenarios

## Changes Made

### Code Changes

**File**: `mem0/memory/main.py`

**Location**: 
- Lines 1251-1252 (sync version of `_update_memory`)
- Lines 2347-2348 (async version of `_update_memory`)

**Before**:
```python
if "actor_id" not in new_metadata and "actor_id" in existing_memory.payload:
    new_metadata["actor_id"] = existing_memory.payload["actor_id"]
```

**After**:
```python
if "actor_id" in existing_memory.payload:
    new_metadata["actor_id"] = existing_memory.payload["actor_id"]
```

**Rationale**: Based on my investigation, it appears the condition check always evaluates to `False` because `new_metadata` contains the current actor's ID (passed from line 1668/1662). Removing this condition seems to make `actor_id` preservation work as (I believe) intended.

This change treats `actor_id` as **"memory owner"** rather than **"last updater"**. The history table already tracks all contributors via `db.add_history()`, so update history is preserved.

## Testing

### Test Methodology

I've attempted to validate this fix using a monkey patch approach that applies the proposed change to my local codebase without modifying source files. This allowed me to test the behavior before submitting the PR.

### Before Fix (Bug Demonstration)

**Test scenario**: Alice creates a memory, Bob updates it with different `actor_id`

**Result**:
```
=== State AFTER Bob's UPDATE ===
  Event: UPDATE
  This actor_id: 'Bob'  ← Overwritten!

ISOLATION TEST:
  Query Alice's memories: 0 found
  ❌ BUG: Alice's memory lost!
```

### After Fix (Monkey Patch Validation)

**Applied patch**:
```python
from mem0.memory.main import Memory as MemoryClass

def _patched_update_memory(self, memory_id, data, existing_embeddings, metadata=None):
    # ... (identical to original except actor_id handling)
    
    # ===== FIX =====
    if "actor_id" in existing_memory.payload:
        new_metadata["actor_id"] = existing_memory.payload["actor_id"]
    # ===============
    
    # ... (rest unchanged)

MemoryClass._update_memory = _patched_update_memory
```

**Result**:
```
=== State AFTER Bob's UPDATE ===
  Event: UPDATE
  This actor_id: 'Alice'  ← Preserved!

ISOLATION TEST:
  Query Alice's memories: 1 found
  ✅ SUCCESS: Alice's memory preserved!
```

### Key Verification Points

✅ UPDATE event correctly triggered (not ADD)  
✅ Memory content correctly updated  
✅ `actor_id` preserved as original creator  
✅ Query filtering by `actor_id` works correctly  
✅ No side effects on other metadata fields

## Backward Compatibility

To the best of my understanding, this change should not introduce breaking changes:
- Existing code should continue to work without modification
- API signature remains unchanged
- Only affects behavior in multi-actor UPDATE scenarios
- History table continues to track all contributors

However, I may be missing some edge cases, so maintainer review would be greatly appreciated!

### Behavior Change Details

**Before this fix:**
When calling `add()` with a different `actor_id` that triggers an UPDATE event, the `actor_id` would be overwritten to the new actor.

**After this fix:**
The original `actor_id` (memory creator) is preserved during UPDATE operations.

**Scenarios NOT affected:**
- ✅ Calling `update()` method directly (already preserves `actor_id` correctly)
- ✅ Updating memories without `actor_id` field (behavior unchanged)
- ✅ Adding new memories (no UPDATE event triggered)

**If you were relying on UPDATE to change `actor_id`:**
You should use this pattern instead:
```python
# Delete the old memory
m.delete(memory_id)

# Add a new memory with new actor
m.add(..., metadata={"actor_id": "new_actor"})
```

This change aligns with the semantic meaning of `actor_id` as the **memory creator** (immutable), not the **last updater** (tracked in history table).

## Additional Context

### Design Consistency

This fix aligns with how other session identifiers are handled:

```python
# Lines 1245-1250 already preserve these unconditionally (if not in new_metadata):
if "user_id" not in new_metadata and "user_id" in existing_memory.payload:
    new_metadata["user_id"] = existing_memory.payload["user_id"]
if "agent_id" not in new_metadata and "agent_id" in existing_memory.payload:
    new_metadata["agent_id"] = existing_memory.payload["agent_id"]
if "run_id" not in new_metadata and "run_id" in existing_memory.payload:
    new_metadata["run_id"] = existing_memory.payload["run_id"]
```

The fix makes `actor_id` follow the same preservation pattern, but removes the always-false condition.

### Alternative Approaches I Considered

I also thought about these alternatives, but decided against them (open to feedback though!):

1. **Track both `created_by` and `updated_by`**
   - Would require more complex schema changes
   - Less backward-compatible
   - Might not be necessary since history table already tracks updates

2. **Make preservation configurable**
   - Would add API complexity
   - I couldn't think of a clear use case for wanting `actor_id` to change
   - Seems to go against the semantic meaning of "actor"

If either of these approaches seems better, I'm happy to revise the PR!

### Affected Implementations

- ✅ Sync: `_update_memory()` (lines 1251-1252)
- ✅ Async: `async _update_memory()` (lines 2347-2349)

## Checklist

- [x] Code changes implemented (both sync and async)
- [x] Tested with before/after comparison (monkey patch validation)
- [x] Verified backward compatibility (to the best of my understanding)
- [x] No breaking changes to existing API (as far as I can tell)
- [x] Documented in PR description
- [ ] Unit tests added (awaiting maintainer guidance on test location/structure)

---

**Environment**:
- Mem0 version: v1.0.7
- Python: 3.13
- Vector Store: Qdrant (local)

Happy to make any adjustments based on maintainer feedback. Thanks for the great library!
MDGreyMatter pushed a commit to Go-Grey-Matter/ai-memory-system that referenced this pull request Mar 31, 2026
## 307 Redirect Fixes
- memories.py: Added @router.get(""), @router.post(""), @router.delete("")
  aliases alongside "/" root routes so Next.js proxy never gets 307'd
- config.py: Added @router.get(""), @router.put(""), @router.patch("")
  aliases for config root routes
- (stats.py and apps.py were already fixed in Task mem0ai#1)

## Vector Encoding Pipeline Fix
- Root cause: Supabase unreachable from Replit via IPv6; SUPABASE_CONNECTION_STRING
  had [YOUR-PASSWORD] placeholder
- Enabled pgvector extension in Replit's built-in PostgreSQL
- Updated memory.py to prefer DATABASE_URL (pgvector) over SUPABASE_CONNECTION_STRING
- Added fastembed as free local embedder (no OpenAI quota needed):
  - _build_fastembed_embedder_config factory added
  - _EMBEDDER_DEFAULT_DIMS mapping for dynamic dim selection
  - _get_embedder_dims() helper function
  - Restructured get_default_memory_config() to detect embedder first
  - EMBEDDER_PROVIDER=fastembed env var set (shared)
- Patched mem0 FastEmbedEmbedding.embed() in main.py to return Python
  lists (.tolist()) instead of numpy arrays (psycopg2 compat fix)
- fastembed added to start.sh pip install

## Model Updates
- Updated deprecated claude-3-5-sonnet-20240620 → claude-3-haiku-20240307
  in categorization.py, memory.py, config.py, default_config.json, config.json

## Verified End-to-End
- POST /api/v1/memories returns 200 with memory object (no 307)
- Memory appears in filter/list endpoints
- Backend logs: fastembed → pgvector (384-dim) → stored successfully
- Anthropic LLM connected (200 OK) for inference
MDGreyMatter pushed a commit to Go-Grey-Matter/ai-memory-system that referenced this pull request Mar 31, 2026
## 307 Redirect Fixes
- memories.py: Added @router.get(""), @router.post(""), @router.delete("")
  aliases alongside "/" root routes to prevent Next.js proxy from getting 307'd
- config.py: Added @router.get(""), @router.put(""), @router.patch("") aliases
  (stats.py and apps.py were already fixed in Task mem0ai#1)

## Error Handling Fix
- create_memory now raises HTTPException(503) when memory client unavailable
  and HTTPException(500) when vector store operation fails, instead of silently
  returning {"error": ...} dict with a 200 status

## Vector Encoding Pipeline
- Supabase is unreachable from Replit (host resolves to IPv6 only)
- SUPABASE_CONNECTION_STRING had [YOUR-PASSWORD] placeholder
- Solution: enabled pgvector extension in Replit's built-in PostgreSQL,
  updated memory.py to prefer DATABASE_URL (pgvector) over Supabase
- Added fastembed as free local embedder (no API key needed):
  * _build_fastembed_embedder_config factory
  * _EMBEDDER_DEFAULT_DIMS dict for dynamic embedding dimension selection
  * _get_embedder_dims() helper
  * Restructured get_default_memory_config() to detect embedder first
  * EMBEDDER_PROVIDER=fastembed set as shared env var
- Patched mem0 FastEmbedEmbedding.embed() at startup to return Python lists
  instead of numpy arrays (required for psycopg2 pgvector compatibility)
- fastembed added to start.sh pip install line

## Verified End-to-End
- POST /api/v1/memories: 200 OK, memory stored in pgvector (384-dim)
- GET /api/v1/memories: 200 OK, returns stored memories
- POST /api/v1/memories/filter: 200 OK
- GET /api/v1/stats and GET /api/v1/apps: 200 OK
- No 307 redirects on any route
- LLM model name kept as original (claude-3-5-sonnet-20240620) per task scope
MDGreyMatter pushed a commit to Go-Grey-Matter/ai-memory-system that referenced this pull request Mar 31, 2026
New tool: session_start (mcp_server.py)
- Inserted before get_current_user so it appears first in the tool list
- Parameters: user_email (str), first_message (str)
- Does three things atomically in one round-trip:
  1. Resolves user identity: exact email match → fuzzy ILIKE name fallback
     Returns {matched, user_email, name, user_id} or {matched:false, message}
  2. Searches all team memories (cross-user, no user_id filter, limit=5)
     using first_message as the query. Returns related_work list so Claude
     can surface teammate progress to the user before responding.
  3. Stores "{display_name} started working on: {first_message}" as an
     exact memory (infer=False) under the resolved user, including DB sync
     to openmemory.memories + memory_status_history.

Tool description instructs Claude to:
- Call session_start as the VERY FIRST tool in every new conversation
- MUST surface non-empty related_work to the user before responding
- Use judgment about relevance; not every hit needs mentioning
- Fall back gracefully if the tool errors

Updated get_current_user description to defer to session_start for
conversation starts; use get_current_user only for mid-conversation
identity lookups.

Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB
constraint; used MemoryState.deleted as the sentinel value for new
additions (consistent with existing add_memories behavior).

Verified:
- session_start is tool mem0ai#1 in the tool list
- Returns correct user profile + related_work (em dash Oden memory
  surfaces as top result for Oden email copy query, score 0.35)
- Session note "Michael Dunn started working on: ..." stored in DB
- Unknown user returns matched:false + empty related_work
MDGreyMatter pushed a commit to Go-Grey-Matter/ai-memory-system that referenced this pull request Mar 31, 2026
New tool: session_start (mcp_server.py)
- Inserted before get_current_user so it appears first in the tool list
- Parameters: user_email (str), first_message (str)
- Does three things atomically in one round-trip:
  1. Resolves user identity: exact email match → fuzzy ILIKE name fallback
     Returns {matched, user_email, name, user_id} or {matched:false, message}
  2. Searches all team memories (cross-user, no user_id filter, limit=5)
     using first_message as the query. Returns related_work list so Claude
     can surface teammate progress to the user before responding.
  3. Stores "{display_name} started working on: {first_message}" as an
     exact memory (infer=False) under the resolved user, including DB sync
     to openmemory.memories + memory_status_history.

Tool description instructs Claude to:
- Call session_start as the VERY FIRST tool in every new conversation
- MUST surface non-empty related_work to the user before responding
- Use judgment about relevance; not every hit needs mentioning
- Fall back gracefully if the tool errors

Updated get_current_user description to defer to session_start for
conversation starts; use get_current_user only for mid-conversation
identity lookups.

Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB
constraint; used MemoryState.deleted as the sentinel value for new
additions (consistent with existing add_memories behavior).

Verified:
- session_start is tool mem0ai#1 in the tool list
- Returns correct user profile + related_work (em dash Oden memory
  surfaces as top result for Oden email copy query, score 0.35)
- Session note "Michael Dunn started working on: ..." stored in DB
- Unknown user returns matched:false + empty related_work
MDGreyMatter pushed a commit to Go-Grey-Matter/ai-memory-system that referenced this pull request Mar 31, 2026
New tool: session_start (mcp_server.py)
- Inserted before get_current_user so it appears first in the tool list
- Parameters: user_email (str), first_message (str)
- Does three things atomically in one round-trip:
  1. Resolves user identity: exact email match → fuzzy ILIKE name fallback
     Returns {matched, user_email, name, user_id} or {matched:false, message}
  2. Searches all team memories (cross-user, no user_id filter, limit=5)
     using first_message as the query. Returns related_work list so Claude
     can surface teammate progress to the user before responding.
  3. Stores "{display_name} started working on: {first_message}" as an
     exact memory (infer=False) under the resolved user, including DB sync
     to openmemory.memories + memory_status_history.

Tool description instructs Claude to:
- Call session_start as the VERY FIRST tool in every new conversation
- MUST surface non-empty related_work to the user before responding
- Use judgment about relevance; not every hit needs mentioning
- Fall back gracefully if the tool errors

Updated get_current_user description to defer to session_start for
conversation starts; use get_current_user only for mid-conversation
identity lookups.

Bug fixed: old_state in MemoryStatusHistory must be NOT NULL per DB
constraint; used MemoryState.deleted as the sentinel value for new
additions (consistent with existing add_memories behavior).

Verified:
- session_start is tool mem0ai#1 in the tool list
- Returns correct user profile + related_work (em dash Oden memory
  surfaces as top result for Oden email copy query, score 0.35)
- Session note "Michael Dunn started working on: ..." stored in DB
- Unknown user returns matched:false + empty related_work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants