fix(oss): normalize malformed LLM fact output before embedding#4224
Merged
kartik-mem0 merged 5 commits intomem0ai:mainfrom Mar 18, 2026
Merged
fix(oss): normalize malformed LLM fact output before embedding#4224kartik-mem0 merged 5 commits intomem0ai:mainfrom
kartik-mem0 merged 5 commits intomem0ai:mainfrom
Conversation
…facts
When smaller LLMs return facts as objects ({"fact": "..."}) instead of
plain strings, embedding_model.embed() crashes with
'dict' object has no attribute 'replace'. This test confirms the bug
by mocking the LLM to return dict-shaped facts and asserting the
failure at mem0/memory/main.py:473.
Port of TypeScript FactRetrievalSchema to Python.
Normalizes facts that smaller LLMs return as {"fact": "..."} or
{"text": "..."} objects back into plain strings before embedding.
After parsing LLM JSON response, normalize facts before passing them to the embedding model. Fixes 'list'/'dict' object has no attribute 'replace' when smaller LLMs return malformed fact objects.
Tests plain strings, {"fact": ...}, {"text": ...}, mixed lists,
and empty string filtering.
kartik-mem0
approved these changes
Mar 18, 2026
Contributor
kartik-mem0
left a comment
There was a problem hiding this comment.
thank you for your contribution @amahuli03
the solution lgtm!
jamebobob
pushed a commit
to jamebobob/mem0-vigil-recall
that referenced
this pull request
Mar 29, 2026
…i#4224) Co-authored-by: kartik-mem0 <kartik.labhshetwar@mem0.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When calling
POST /api/v1/memories/withinfer=true, smaller LLMs (e.g. Ollama's llama3.1:8b) intermittently return facts as objects ({"fact": "..."}or{"text": "..."}) instead of plain strings. These non-string values are passed directly toembedding_model.embed(), which calls.replace("\n", " ")on them, causingAttributeError: 'list' object has no attribute 'replace'.Fixes #4100
Root cause
The fact extraction path in
mem0/memory/main.pyparses the LLM's JSON response and passes each fact directly to the embedding model without validating its type:The prompt asks the LLM to return
{"facts": ["string1", "string2"]}, but smaller models don't reliably follow this format. Instead they return structures like:{"facts": [{"fact": "User likes Python"}, {"text": "User is a developer"}]}This is a known issue — the TypeScript SDK already fixed it in db15d5c6 by adding a FactRetrievalSchema that normalizes these malformed shapes before embedding. See mem0-ts/src/oss/src/prompts/index.ts. The Python SDK was missing equivalent validation.
Fix
This PR ports that validation to Python by adding
normalize_facts()inmem0/memory/utils.py, which handles:{"fact": "..."}objects (extracts the fact value){"text": "..."}objects (extracts the text value)str())It is called in both the
syncandasync _add_to_vector_storepaths immediately after JSON parsing, before facts reach the embedder.Related issues: #3439, #3238
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Please delete options that are not relevant.
hatch run pytest tests/test_memory.py— 17/17 pass.Tests added:
test_add_infer_with_malformed_llm_facts): mocks LLM returning dict-shaped facts withinfer=True, confirms noAttributeError.normalize_factsunit tests: plain strings,{"fact": ...}, {"text": ...}, mixed lists, empty string filteringChecklist:
Maintainer Checklist