Switch OpenVLA to PromptReplacement with dedicated chat template by mgehre-amd · Pull Request #27 · mgehre-amd/vllm

mgehre-amd · 2026-03-26T11:07:56Z

@mkorhone, I played a bit with your PR today to understand why the changes in vllm/benchmarks are necessary. It looks like they aren't when we use PromptReplacement and make a proper template_openvla.jinja, which generates the prompt in the order we need.
It looks like this produces the right output, both via

export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE
vllm serve openvla/openvla-7b \
    --trust-remote-code --dtype bfloat16 --max-model-len 512 \
    --enforce-eager --mm-processor-cache-gb 0

  # 2. In another terminal, send a chat completion request with an image:
  curl -s http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openvla/openvla-7b",
      "messages": [{
        "role": "user",
        "content": [
          {"type": "image_url", "image_url": {"url": "data:image/png;base64,'"$(python3 -c "
  import base64, io; from PIL import Image
  img = Image.open('/scratch/mgehre/openvla/sample_data/bridge_sample_01.jpg').convert('RGB').resize((224,224), Image.BILINEAR)
  buf = io.BytesIO(); img.save(buf, format='PNG'); print(base64.b64encode(buf.getvalue()).decode())
  ")"'"}},
          {"type": "text", "text": "In: What action should the robot take to put the blue cube on the right side of the table on top of the rectangular block?\nOut: "}
        ]
      }],
      "max_tokens": 7,
      "temperature": 0.0
    }' | python3 -m json.tool

and via vllm-bench.py, and would remove any changes in non-openvla-specific files from your PR.
Please check whether that makes sense. I'm also not 100% sure that I validated the right thing.

Switch OpenVLA from PromptInsertion to PromptReplacement so the chat completions API can correctly place image tokens. PromptInsertion required get_placeholder_str to return None, which prevented the chat API from knowing where to insert image placeholders.

PromptReplacement uses (token 32000) as the target token. The new template_openvla.jinja chat template:

Emits {{ bos_token }} (needed because chat path uses add_special_tokens=False)
Outputs for image content parts
Enforces image-before-text ordering regardless of client content part order, since OpenVLA requires image tokens after BOS

Also reverts benchmark files to match target branch.

mkorhone · 2026-03-27T01:37:09Z

Thank you for these suggestions! Going down the PromptReplacement path dramatically improved things. I have integrated your changes into my branch, and listed you as a co-author.

mgehre-amd requested a review from mkorhone March 26, 2026 11:07

mkorhone force-pushed the mkorhone/merge_openvla_pr branch 3 times, most recently from 8f3105e to a0f3634 Compare March 26, 2026 21:50

mgehre-amd closed this Mar 26, 2026

mkorhone mentioned this pull request Mar 27, 2026

Add OpenVLA (Vision-Language-Action) model support #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch OpenVLA to PromptReplacement with dedicated chat template#27

Switch OpenVLA to PromptReplacement with dedicated chat template#27
mgehre-amd wants to merge 0 commit into
mkorhone/merge_openvla_prfrom
matthias/openvla-prompt-replacement

mgehre-amd commented Mar 26, 2026 •

edited

Loading

Uh oh!

mkorhone commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgehre-amd commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkorhone commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgehre-amd commented Mar 26, 2026 •

edited

Loading