Skip to content

vllm provider drops response_format in generate_response, causing JSON parsing failures in memory extraction/update #4607

@mqqqqj

Description

@mqqqqj

Component

Ollama / Local Models

Description

Summary

The vllm LLM provider accepts a response_format argument in generate_response(...), but it does not forward that argument into the actual request params sent to the OpenAI-compatible client.

Because of that, mem0 may assume the model is constrained to return JSON, while the backend model is actually only guided by prompt text. This can lead to invalid/non-JSON outputs and downstream parsing failures during memory extraction/update.

Steps to Reproduce

I am using:

  • mem0 with provider: "vllm"
  • local OpenAI-compatible vLLM endpoint
  • local embedding model
  • local Qdrant path-based vector storage

A conversational correction such as:

  • previous memory inferred: user prefers fast-paced sci-fi movies
  • user later says: actually I prefer slow-paced healing movies; I still dislike horror, but prefer healing movies over sci-fi
    can trigger the memory update stage to fail with:
    Invalid JSON response: Expecting value: line 1 column 1 (char 0)

This appears consistent with response_format not being enforced at the provider layer.

Expected Behavior

If generate_response(..., response_format=...) is called, the vllm provider should forward response_format into the request params, similar to other providers.

Actual Behavior

Invalid JSON response: Expecting value: line 1 column 1 (char 0)

Environment

  • mem0 version:1.0.7
  • Python/Node version:python=3.10
  • OS:ubuntu24.04

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions