[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps#36136
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in Qwen3-VL and Qwen3.5-VL related to a timestamp mismatch when num_frames is specified without fps. The changes ensure that if num_frames is provided, it is used directly for timestamp calculations, which aligns with the behavior of the underlying Hugging Face processor and resolves the AssertionError. The implementation is clean and directly fixes the issue by passing num_frames through to the timestamp generation logic. The changes look correct and I have no further feedback.
5add9ee to
896d89b
Compare
Isotr0py
left a comment
There was a problem hiding this comment.
Thanks, can you add a processor test under tests/models/multimodal/processing to avoid regression?
Head branch was pushed to by a user without write access
58b6d2f to
14d7a3d
Compare
|
Hi @DarkLight1337, I checked the failing CI — the |
|
Hi @OiPunk, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
00b007f to
b7ba987
Compare
CI Failure AnalysisFailing check: Verdict: Unrelated flaky test — no code changes needed. Evidence:
|
|
@DarkLight1337 I've investigated the failing CI. The |
…t fps When `num_frames` is provided via `mm_processor_kwargs` without `fps`, `_get_video_second_idx` was falling back to the default fps (2) to compute the number of frames for timestamp calculation. However, the HF processor uses `num_frames` directly (since `num_frames` and `fps` are mutually exclusive in `Qwen3VLVideoProcessor.sample_frames`). This caused a mismatch between the computed timestamps length and the actual `video_grid_thw` frame count, triggering an assertion error. Pass `num_frames` through to `_get_video_second_idx` and use it directly when provided, mirroring the HF processor's behavior. Also explicitly set `fps=None` when `num_frames` is specified to prevent HF's `BaseVideoProcessor.preprocess()` from filling in the class default (`fps=2`) via `setdefault()`. Also adds processor regression tests for Qwen3-VL num_frames timestamp handling, with video metadata aligned to ndarray bounds. Fixes vllm-project#35909 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: OiPunk <codingpunk@gmail.com>
b7ba987 to
df3f36d
Compare
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…t fps (vllm-project#36136) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Summary
Fixes #35909.
When
num_framesis provided viamm_processor_kwargswithoutfps,_get_video_second_idxwas falling back to the default fps (2) to compute the number of frames for timestamp calculation. However, HF'sQwen3VLVideoProcessor.sample_framestreatsnum_framesandfpsas mutually exclusive — whennum_framesis provided, it uses it directly. This caused a mismatch between the computed timestamps length and the actualvideo_grid_thwframe count, triggering:Changes:
sampled_num_framesparameter to_get_video_second_idxinQwen3VLProcessingInfonum_framesis explicitly provided, use it directly instead of computing from fps, mirroring HF's behaviornum_framesfrommm_processor_kwargsthrough to_get_video_second_idxin_call_hf_processorThis fix also applies to Qwen3.5-VL which inherits
Qwen3VLProcessingInfo.Test plan
mm_processor_kwargs={"num_frames": 32, "do_sample_frames": True}tollm.chat()with a video input on Qwen3-VLnum_framesprovided)num_framesandfpsraises an error in HF processor (existing behavior, unchanged)