Skip to content

fix: force gemma4_unified detection as VLM#1744

Merged
jundot merged 1 commit into
jundot:mainfrom
FaisalFehad:fix/gemma4-unified-vlm-support
Jun 10, 2026
Merged

fix: force gemma4_unified detection as VLM#1744
jundot merged 1 commit into
jundot:mainfrom
FaisalFehad:fix/gemma4-unified-vlm-support

Conversation

@FaisalFehad

@FaisalFehad FaisalFehad commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Forces gemma4_unified detection as VLM in model discovery.

Problem

gemma4_unified models were detected as LLM when their config.json lacked vision_config (e.g., some quantized variants). Since unified models always include vision+audio capabilities by definition, they should never fall back to LLM.

Changes

omlx/model_discovery.py — Force gemma4_unified → VLM detection regardless of vision_config presence.

tests/test_model_discovery.py — Updated test_detect_text_only_gemma4_unified_as_llmtest_detect_gemma4_unified_without_vision_config_as_vlm since gemma4_unified is always VLM.

Notes

  • The feature extractor mapping in vlm.py is not changedGemma4UnifiedAudioFeatureExtractor correctly resolves to mlx_vlm.models.gemma4_unified.processing_gemma4_unified.Gemma4UnifiedAudioFeatureExtractor in the pinned mlx-vlm (commit 54c9a11). That class differs from Gemma4AudioFeatureExtractor (which produces mel spectrograms) — the unified extractor outputs raw waveform chunks with audio_samples_per_token=640.
  • gemma4_unified is an architecture type (not a size label). Currently only the 12B uses it; future models with the same unified audio+vision architecture would also use this type.
  • All 125 model discovery tests pass.

@FaisalFehad FaisalFehad changed the title fix: gemma4_unified VLM detection and feature extractor mapping fix: add support for Gemma 4 12B IT (gemma4_unified) Jun 9, 2026
@jundot

jundot commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Thanks for the PR. The model discovery change makes sense to me, but I do not think the feature extractor mapping should change.

Gemma4UnifiedAudioFeatureExtractor does exist in the Gemma 4 unified processor path, and current mlx-vlm has it here:
https://github.com/Blaizzy/mlx-vlm/blob/c9a6743e5f059be36362d8e4a78171bbf91f18bc/mlx_vlm/models/gemma4_unified/processing_gemma4_unified.py#L251

It is not equivalent to Gemma4AudioFeatureExtractor; the unified extractor uses different defaults and output shape for unified audio inputs. Remapping the unified type to the Gemma4 extractor can break gemma4_unified audio requests.

I can merge the narrower discovery fix once that mapping change is removed or covered by a runtime-compatible test.

@FaisalFehad FaisalFehad force-pushed the fix/gemma4-unified-vlm-support branch from 001769d to 5c9d897 Compare June 9, 2026 06:24
@FaisalFehad FaisalFehad changed the title fix: add support for Gemma 4 12B IT (gemma4_unified) fix: force gemma4_unified detection as VLM Jun 9, 2026
@FaisalFehad FaisalFehad force-pushed the fix/gemma4-unified-vlm-support branch from 5c9d897 to 389ffff Compare June 9, 2026 06:44
@FaisalFehad

FaisalFehad commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Thank you @jundot I have updated the PR.

Gemma4 unified models always include vision+audio capabilities, so they
should never fall back to LLM detection even when vision_config is absent
from config.json (e.g. some quantized variants).

- Add special case in detect_model_type for gemma4_unified → VLM
- Update test to reflect that gemma4_unified is always VLM
@FaisalFehad FaisalFehad force-pushed the fix/gemma4-unified-vlm-support branch from 389ffff to 396c752 Compare June 9, 2026 07:00
@jundot

jundot commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Thanks for updating this. The narrowed discovery-only change looks right: gemma4_unified should stay on the mlx-vlm path even when quantized configs omit vision_config, and the unified audio feature extractor mapping is preserved.

CI is green, so this looks good to me, and I am going to merge it.

@jundot jundot merged commit 4b71249 into jundot:main Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants