-
Notifications
You must be signed in to change notification settings - Fork 5.8k
vllm provider drops response_format in generate_response, causing JSON parsing failures in memory extraction/update #4607
Description
Component
Ollama / Local Models
Description
Summary
The vllm LLM provider accepts a response_format argument in generate_response(...), but it does not forward that argument into the actual request params sent to the OpenAI-compatible client.
Because of that, mem0 may assume the model is constrained to return JSON, while the backend model is actually only guided by prompt text. This can lead to invalid/non-JSON outputs and downstream parsing failures during memory extraction/update.
Steps to Reproduce
I am using:
- mem0 with provider: "vllm"
- local OpenAI-compatible vLLM endpoint
- local embedding model
- local Qdrant path-based vector storage
A conversational correction such as:
- previous memory inferred: user prefers fast-paced sci-fi movies
- user later says: actually I prefer slow-paced healing movies; I still dislike horror, but prefer healing movies over sci-fi
can trigger the memory update stage to fail with:
Invalid JSON response: Expecting value: line 1 column 1 (char 0)
This appears consistent with response_format not being enforced at the provider layer.
Expected Behavior
If generate_response(..., response_format=...) is called, the vllm provider should forward response_format into the request params, similar to other providers.
Actual Behavior
Invalid JSON response: Expecting value: line 1 column 1 (char 0)
Environment
- mem0 version:1.0.7
- Python/Node version:python=3.10
- OS:ubuntu24.04