vllm provider drops response_format in generate_response, causing JSON parsing failures in memory extraction/update

### Component

Ollama / Local Models

### Description

### Summary

The `vllm` LLM provider accepts a `response_format` argument in `generate_response(...)`, but it does not forward that argument into the actual request params sent to the OpenAI-compatible client.

Because of that, `mem0` may assume the model is constrained to return JSON, while the backend model is actually only guided by prompt text. This can lead to invalid/non-JSON outputs and downstream parsing failures during memory extraction/update.


### Steps to Reproduce

I am using:
- mem0 with provider: "vllm"
- local OpenAI-compatible vLLM endpoint
- local embedding model
- local Qdrant path-based vector storage

A conversational correction such as:
- previous memory inferred: user prefers fast-paced sci-fi movies
- user later says: actually I prefer slow-paced healing movies; I still dislike horror, but prefer healing movies over sci-fi
can trigger the memory update stage to fail with:
**Invalid JSON response: Expecting value: line 1 column 1 (char 0)**

This appears consistent with response_format not being enforced at the provider layer.

### Expected Behavior

If generate_response(..., response_format=...) is called, the vllm provider should forward response_format into the request params, similar to other providers.

### Actual Behavior

**Invalid JSON response: Expecting value: line 1 column 1 (char 0)**

### Environment

- mem0 version:1.0.7
- Python/Node version:python=3.10
- OS:ubuntu24.04


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm provider drops response_format in generate_response, causing JSON parsing failures in memory extraction/update #4607

Component

Description

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm provider drops response_format in generate_response, causing JSON parsing failures in memory extraction/update #4607

Description

Component

Description

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions