Skip to content
This repository was archived by the owner on May 5, 2026. It is now read-only.

Switch OpenVLA to PromptReplacement with dedicated chat template#27

Closed
mgehre-amd wants to merge 0 commit into
mkorhone/merge_openvla_prfrom
matthias/openvla-prompt-replacement
Closed

Switch OpenVLA to PromptReplacement with dedicated chat template#27
mgehre-amd wants to merge 0 commit into
mkorhone/merge_openvla_prfrom
matthias/openvla-prompt-replacement

Conversation

@mgehre-amd

@mgehre-amd mgehre-amd commented Mar 26, 2026

Copy link
Copy Markdown
Owner

@mkorhone, I played a bit with your PR today to understand why the changes in vllm/benchmarks are necessary. It looks like they aren't when we use PromptReplacement and make a proper template_openvla.jinja, which generates the prompt in the order we need.
It looks like this produces the right output, both via

export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE
vllm serve openvla/openvla-7b \
    --trust-remote-code --dtype bfloat16 --max-model-len 512 \
    --enforce-eager --mm-processor-cache-gb 0

  # 2. In another terminal, send a chat completion request with an image:
  curl -s http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openvla/openvla-7b",
      "messages": [{
        "role": "user",
        "content": [
          {"type": "image_url", "image_url": {"url": "data:image/png;base64,'"$(python3 -c "
  import base64, io; from PIL import Image
  img = Image.open('/scratch/mgehre/openvla/sample_data/bridge_sample_01.jpg').convert('RGB').resize((224,224), Image.BILINEAR)
  buf = io.BytesIO(); img.save(buf, format='PNG'); print(base64.b64encode(buf.getvalue()).decode())
  ")"'"}},
          {"type": "text", "text": "In: What action should the robot take to put the blue cube on the right side of the table on top of the rectangular block?\nOut: "}
        ]
      }],
      "max_tokens": 7,
      "temperature": 0.0
    }' | python3 -m json.tool

and via vllm-bench.py, and would remove any changes in non-openvla-specific files from your PR.
Please check whether that makes sense. I'm also not 100% sure that I validated the right thing.

Switch OpenVLA from PromptInsertion to PromptReplacement so the chat completions API can correctly place image tokens. PromptInsertion required get_placeholder_str to return None, which prevented the chat API from knowing where to insert image placeholders.

PromptReplacement uses (token 32000) as the target token. The new template_openvla.jinja chat template:

  • Emits {{ bos_token }} (needed because chat path uses add_special_tokens=False)
  • Outputs for image content parts
  • Enforces image-before-text ordering regardless of client content part order, since OpenVLA requires image tokens after BOS

Also reverts benchmark files to match target branch.

@mgehre-amd mgehre-amd requested a review from mkorhone March 26, 2026 11:07
@mkorhone mkorhone force-pushed the mkorhone/merge_openvla_pr branch 3 times, most recently from 8f3105e to a0f3634 Compare March 26, 2026 21:50
@mgehre-amd mgehre-amd closed this Mar 26, 2026
@mkorhone

Copy link
Copy Markdown
Collaborator

Thank you for these suggestions! Going down the PromptReplacement path dramatically improved things. I have integrated your changes into my branch, and listed you as a co-author.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants