Skip to content

fix(oss): validate LLM fact output via FactRetrievalSchema before embedding#4083

Merged
deshraj merged 1 commit intomem0ai:mainfrom
mgoulart:fix/ollama-embed-non-string-facts
Feb 23, 2026
Merged

fix(oss): validate LLM fact output via FactRetrievalSchema before embedding#4083
deshraj merged 1 commit intomem0ai:mainfrom
mgoulart:fix/ollama-embed-non-string-facts

Conversation

@mgoulart
Copy link
Copy Markdown
Contributor

Summary

Fixes #4081

addToVectorStore passes raw LLM-extracted facts directly to embedder.embed() without validating they are strings. Local/smaller LLMs (e.g. llama3.1:8b via Ollama) sometimes return facts as objects instead of plain strings:

// Expected
{"facts": ["User prefers dark mode", "Name is Alice"]}

// What llama3.1:8b sometimes returns
{"facts": [{"fact": "User prefers dark mode"}, {"fact": "Name is Alice"}]}

This causes Ollama's Go server to crash with:

ResponseError: json: cannot unmarshal object into Go struct field EmbeddingRequest.prompt of type string

Approach

FactRetrievalSchema already exists in prompts/index.ts but was never used at the parse site. This PR:

  1. prompts/index.ts — Extends FactRetrievalSchema with a z.union transform that accepts string | { fact: string } | { text: string } and normalizes to string[], filtering empties.

  2. memory/index.ts — Replaces raw JSON.parse().facts with FactRetrievalSchema.parse() so the schema is actually used.

  3. embeddings/ollama.ts — One-line safety net (typeof text === "string" ? text : JSON.stringify(text)) in case non-string values reach the embedder from other callers.

Test plan

  • memory.add() with llama3.1:8b returning [{ fact: "..." }] — normalizes and stores correctly
  • memory.add() with models returning ["string"] — no behavior change
  • Zod .parse() catches completely invalid JSON and falls through to facts = []
  • Empty facts are filtered out and don't enter the vector store

🤖 Generated with Claude Code

…edding

Local/smaller LLMs (e.g. llama3.1:8b via Ollama) sometimes return facts
as objects ({ fact: "..." }) instead of plain strings. addToVectorStore
passed these directly to the embedder, crashing Ollama's Go server:

  ResponseError: json: cannot unmarshal object into Go struct
    field EmbeddingRequest.prompt of type string

Fix: use the existing (but unused) FactRetrievalSchema with a z.union
transform to accept string | { fact } | { text } shapes and normalize
to string[]. Add a one-line safety net in OllamaEmbedder.embed().

Fixes mem0ai#4081

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 20, 2026

CLA assistant check
All committers have signed the CLA.

@mem0-bot
Copy link
Copy Markdown
Contributor

mem0-bot bot commented Feb 23, 2026

Great fix for a real production issue! 👍

Thank you for tackling this crash with local LLMs. The approach is solid and the implementation is clean.

What I like:

  • Smart use of existing infrastructure: Leveraging the already-defined FactRetrievalSchema that wasn't being used
  • Robust normalization: The Zod union with transforms elegantly handles the common malformed shapes from smaller LLMs
  • Defense in depth: Adding the safety net in OllamaEmbedder is good defensive programming
  • Backward compatibility: Existing working code continues to work unchanged
  • Clear problem statement: The issue description and reproduction steps are excellent

Code quality observations:

mem0-ts/src/oss/src/prompts/index.ts: The factItem union type is well-designed. The transformation logic cleanly extracts text from {fact: string} and {text: string} objects while preserving plain strings.

mem0-ts/src/oss/src/memory/index.ts: Good replacement of raw JSON parsing with proper schema validation. Error handling maintains existing behavior.

mem0-ts/src/oss/src/embeddings/ollama.ts: The defensive JSON.stringify() fallback is reasonable, though for complex objects it might produce less meaningful embeddings than extracting a text field would.

The fix is focused, low-risk, and addresses the root cause appropriately. Nice work! 🚀

@deshraj deshraj merged commit db15d5c into mem0ai:main Feb 23, 2026
1 of 2 checks passed
jamebobob pushed a commit to jamebobob/mem0-vigil-recall that referenced this pull request Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

addToVectorStore passes raw LLM output (may be object, not string) to embedder.embed() — crashes Ollama with 'cannot unmarshal object'

3 participants