Skip to content

[Bugfix] Enable audio transcription endpoint for Gemma 4#43609

Open
SoluMilken wants to merge 4 commits into
vllm-project:mainfrom
SoluMilken:fix/gemma4-audio-transcriptions
Open

[Bugfix] Enable audio transcription endpoint for Gemma 4#43609
SoluMilken wants to merge 4 commits into
vllm-project:mainfrom
SoluMilken:fix/gemma4-audio-transcriptions

Conversation

@SoluMilken

@SoluMilken SoluMilken commented May 25, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fix #40994.

Test Plan

  1. Launch a server

    python -m vllm.entrypoints.openai.api_server --model google/gemma-4-E2B-it  --tensor-parallel-size 2 --enforce-eager --port 7788 
  2. Call transcription API

    curl http://localhost:7788/v1/audio/transcriptions   -F "file=@./test_en.wav"   -F "model=google/gemma-4-E2B-it"   -F "language=en"
    
  3. Call translation API

    curl http://localhost:7788/v1/audio/translations   -F "file=@./test_zh.wav"   -F "model=google/gemma-4-E2B-it"   -F "language=zh" -F "to_language=en" -F "response_format=text"
    

Test Result

Call transcription API

image

Call translation API

image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@mergify

mergify Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Documentation preview: https://vllm--43609.org.readthedocs.build/en/43609/

@mergify mergify Bot added the documentation Improvements or additions to documentation label May 25, 2026
@SoluMilken SoluMilken changed the title Enable Gemma4 audio transcription endpoint [BugFix] Enable Gemma4 audio transcription endpoint May 25, 2026
@mergify mergify Bot added the bug Something isn't working label May 25, 2026
@SoluMilken SoluMilken changed the title [BugFix] Enable Gemma4 audio transcription endpoint [Bugfix] Enable audio transcription endpoint for Gemma 4 May 25, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements speech-to-text capabilities for the Gemma 4 model, including transcription and translation support. The changes involve updating the supported models documentation, adding a new test file for transcription, and implementing the SupportsTranscription interface in the Gemma 4 model executor. Review feedback highlighted a recurring typo where the end-of-turn token was incorrectly written as <turn|> instead of <|turn|>, which needs to be corrected in both the model implementation and the associated tests to ensure proper tokenizer behavior.

Comment thread vllm/model_executor/models/gemma4_mm.py
Comment thread tests/model_executor/test_gemma4_transcription.py Outdated
Comment thread tests/model_executor/test_gemma4_transcription.py Outdated
@SoluMilken SoluMilken force-pushed the fix/gemma4-audio-transcriptions branch from e6b6bca to 492d3ed Compare May 26, 2026 16:11
@SoluMilken SoluMilken marked this pull request as ready for review May 26, 2026 16:13
@DarkLight1337 DarkLight1337 requested review from Isotr0py and ywang96 May 26, 2026 16:14
@mergify mergify Bot added the multi-modality Related to multi-modality (#4194) label May 26, 2026

@Isotr0py Isotr0py left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall look reasonable. But would like to let @NickLucche have a second eye too.

@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label May 26, 2026
@SoluMilken

Copy link
Copy Markdown
Contributor Author

Thanks @darklight for helping request the right reviewers, and a huge thanks to @isotrop for the super quick review! 🙌
I'm heading to bed now, but I'll check on the CI results in the morning. Thanks everyone! 🌙

@SoluMilken SoluMilken force-pushed the fix/gemma4-audio-transcriptions branch 2 times, most recently from 0c9b311 to c6edc42 Compare May 27, 2026 12:53
@SoluMilken SoluMilken mentioned this pull request May 27, 2026
Co-authored-by: OpenAI Codex

Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
@SoluMilken SoluMilken force-pushed the fix/gemma4-audio-transcriptions branch from c6edc42 to 24652b4 Compare May 27, 2026 16:36
@SoluMilken

Copy link
Copy Markdown
Contributor Author

Try to fix the failed CI buildkite/ci/pr/basic-models-tests-extra-initialization-2 by this PR #43831

@mergify

mergify Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @SoluMilken.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) needs-rebase ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: vllm does not expose /v1/audio/transcriptions for google/gemma-4-E4B-it

2 participants