Skip to content

Add video input support for Gemma 4 12B#1292

Merged
Blaizzy merged 2 commits into
Blaizzy:mainfrom
lucasnewman:gemma4-unified-video-processor
Jun 4, 2026
Merged

Add video input support for Gemma 4 12B#1292
Blaizzy merged 2 commits into
Blaizzy:mainfrom
lucasnewman:gemma4-unified-video-processor

Conversation

@lucasnewman

Copy link
Copy Markdown
Collaborator

Resolves #1277

python -m mlx_vlm.generate \
  --model mlx-community/gemma-4-12B-it-bf16 \
  --video examples/videos/car_video.mp4 \
  --fps 1 \
  --prompt "Describe this video in one concise paragraph. Focus on the visible scene and action." \
  --max-tokens 96 \
  --temperature 0
A high-angle shot shows a black minivan and a silver sedan parked in a lot, where a man in a grey shirt opens the minivan's door and then steps out, while a red car is parked in the background.

@lucasnewman lucasnewman requested a review from Blaizzy June 4, 2026 22:49
@Blaizzy Blaizzy merged commit 526c210 into Blaizzy:main Jun 4, 2026
1 check passed
Vect0rM pushed a commit to AtomicBot-ai/mlx-vlm that referenced this pull request Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemma 4 12B Unified: vision_embedder fails with layer_norm size mismatch (6912) on --video/--image — preprocessor does not patchify raw pixels

2 participants