Skip to content

fix(oss): normalize malformed LLM fact output before embedding#4224

Merged
kartik-mem0 merged 5 commits intomem0ai:mainfrom
amahuli03:4100/post-memories-with-infer-true
Mar 18, 2026
Merged

fix(oss): normalize malformed LLM fact output before embedding#4224
kartik-mem0 merged 5 commits intomem0ai:mainfrom
amahuli03:4100/post-memories-with-infer-true

Conversation

@amahuli03
Copy link
Copy Markdown
Contributor

Description

When calling POST /api/v1/memories/ with infer=true, smaller LLMs (e.g. Ollama's llama3.1:8b) intermittently return facts as objects ({"fact": "..."} or {"text": "..."}) instead of plain strings. These non-string values are passed directly to embedding_model.embed(), which calls .replace("\n", " ") on them, causing AttributeError: 'list' object has no attribute 'replace'.

Fixes #4100

Root cause

The fact extraction path in mem0/memory/main.py parses the LLM's JSON response and passes each fact directly to the embedding model without validating its type:

new_retrieved_facts = json.loads(response)["facts"]
# ...
for new_mem in new_retrieved_facts:
    messages_embeddings = self.embedding_model.embed(new_mem, "add")  # assumes string

The prompt asks the LLM to return {"facts": ["string1", "string2"]}, but smaller models don't reliably follow this format. Instead they return structures like:

{"facts": [{"fact": "User likes Python"}, {"text": "User is a developer"}]}

This is a known issue — the TypeScript SDK already fixed it in db15d5c6 by adding a FactRetrievalSchema that normalizes these malformed shapes before embedding. See mem0-ts/src/oss/src/prompts/index.ts. The Python SDK was missing equivalent validation.

Fix

This PR ports that validation to Python by adding normalize_facts() in mem0/memory/utils.py, which handles:

  • Plain strings (passthrough)
  • {"fact": "..."} objects (extracts the fact value)
  • {"text": "..."} objects (extracts the text value)
  • Other types (converts via str())
  • Empty strings (filtered out)

It is called in both the sync and async _add_to_vector_store paths immediately after JSON parsing, before facts reach the embedder.

Related issues: #3439, #3238

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • Unit Test
    hatch run pytest tests/test_memory.py — 17/17 pass.

Tests added:

  • Reproduction test (test_add_infer_with_malformed_llm_facts): mocks LLM returning dict-shaped facts with infer=True, confirms no AttributeError.
  • normalize_facts unit tests: plain strings, {"fact": ...}, {"text": ...}, mixed lists, empty string filtering

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

Maintainer Checklist

amahuli03 and others added 5 commits March 5, 2026 11:41
…facts

When smaller LLMs return facts as objects ({"fact": "..."}) instead of
plain strings, embedding_model.embed() crashes with
'dict' object has no attribute 'replace'. This test confirms the bug
by mocking the LLM to return dict-shaped facts and asserting the
failure at mem0/memory/main.py:473.
Port of TypeScript FactRetrievalSchema to Python.
Normalizes facts that smaller LLMs return as {"fact": "..."} or
{"text": "..."} objects back into plain strings before embedding.
After parsing LLM JSON response, normalize facts before passing them
to the embedding model. Fixes 'list'/'dict' object has no attribute
'replace' when smaller LLMs return malformed fact objects.
Tests plain strings, {"fact": ...}, {"text": ...}, mixed lists,
and empty string filtering.
@kartik-mem0 kartik-mem0 self-requested a review March 18, 2026 14:42
Copy link
Copy Markdown
Contributor

@kartik-mem0 kartik-mem0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for your contribution @amahuli03

the solution lgtm!

@kartik-mem0 kartik-mem0 merged commit 577a5a2 into mem0ai:main Mar 18, 2026
8 checks passed
jamebobob pushed a commit to jamebobob/mem0-vigil-recall that referenced this pull request Mar 29, 2026
…i#4224)

Co-authored-by: kartik-mem0 <kartik.labhshetwar@mem0.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenMemory infer=true fails with 'list' object has no attribute 'replace' (Qdrant path)

2 participants