Skip to content

Add OpenVLA model support#29738

Draft
yongming-qin wants to merge 2 commits into
vllm-project:mainfrom
yongming-qin:support-openvla-v2
Draft

Add OpenVLA model support#29738
yongming-qin wants to merge 2 commits into
vllm-project:mainfrom
yongming-qin:support-openvla-v2

Conversation

@yongming-qin

@yongming-qin yongming-qin commented Nov 30, 2025

Copy link
Copy Markdown
Contributor

Purpose

Add support for OpenVLA model in vLLM. OpenVLA is a vision-language-action model that uses timm-based vision backbones (Prismatic architecture) with LLM backbones for action prediction tasks. This implementation follows the same pattern as DeepSeek-VL2, which also uses timm for vision processing.

This PR adds:

  • Model executor implementation (vllm/model_executor/models/openvla.py)
  • Configuration class (vllm/transformers_utils/configs/openvla.py)
  • Processor class (vllm/transformers_utils/processors/openvla.py)

The implementation supports:

  • Single and fused vision backbones using timm ViT models
  • Multiple LLM backbones (Llama-2, Mistral, Phi-3)
  • Image embedding insertion via prompt updates
  • Tensor parallelism support for vision backbone

FIX #14739

Test Plan

  1. Basic inference test:

    vllm serve openvla/openvla-7b --trust-remote-code
  2. Test with image input:

    • Use OpenAI-compatible API to send requests with image data
    • Verify image embeddings are correctly processed and inserted
  3. Compare outputs with HuggingFace implementation:

    • Run inference on same inputs with both vLLM and HF implementations
    • Verify output logits/tokens match

Test Result

[To be filled after testing]

@yongming-qin

Copy link
Copy Markdown
Contributor Author

Note: Currenly the model can be loaded by vllm and we can use it to process image + text instruction. However, the results of vllm and Transformers are different. Comments and collaboration are welcome.

@mergify

mergify Bot commented Dec 8, 2025

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yongming-qin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…vision.

Signed-off-by: Luke <yq0536@gmail.com>
…penvla-7b

Signed-off-by: Luke <yq0536@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-model Requests to new models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]:Can you support the VLA series models? For example, openVLA.

1 participant