fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347) by Anai-Guo · Pull Request #5350 · unslothai/unsloth

Anai-Guo · 2026-05-09T19:24:36Z

Summary

When a flat local GGUF directory holds several unrelated models with their own mmproj siblings, detect_mmproj_file() walked the directory and returned the first projector it found. For the layout in #5347:

E:\llm_models\
├── Qwen3.5-9B-Q4_K_M.gguf       ← user picks this
├── Qwen3.5-9B-BF16-mmproj.gguf  ← correct projector
└── gemma-4-26B-A4B-it.mmproj-q8_0.gguf

llama-server was launched with --mmproj /workspace/llm-models/gemma-4-26B-A4B-it.mmproj-q8_0.gguf, which fails to load and surfaces as a confusing crash that makes valid local GGUFs look broken.

Approach

Two-stage disambiguation, applied only when path is a .gguf file (we know the target model):

Family-token filter. A projector whose filename family token (qwen, gemma, llama, mistral, phi, deepseek, internvl, minicpm, llava, glm, yi, command-r, molmo, pixtral, smolvlm, moondream, granite, ovis) disagrees with the target model's family is dropped. Candidates whose family is not recognised (e.g. the HF-convention mmproj-F16.gguf) are kept — they could legitimately belong to any model.
Longest-stem-prefix tie-break. Among same-family candidates, prefer the one whose stem shares the longest prefix with the model (e.g. for Qwen3.5-35B-A3B-UD-Q4_K_L.gguf, Qwen3.5-35B-A3B-BF16-mmproj.gguf beats Qwen3.5-9B-BF16-mmproj.gguf).

If every candidate is dropped by rule (1), return None rather than fall back to an arbitrary mismatched projector — better to launch without --mmproj than with the wrong one.

When path is a directory (no model name to compare against) the historical "first candidate" behaviour is preserved.

Backwards compatibility

Single same-family projector → unchanged.
HF-style mmproj-F16.gguf next to a single weight → unchanged (kept by the no-recognised-family clause).
The search_root walk for snapshot layouts (snapshot/BF16/foo.gguf + snapshot/mmproj-BF16.gguf) is preserved.
Directory-as-path callers → unchanged.

Tests

studio/backend/tests/test_detect_mmproj_file.py covers:

test_returns_none_when_no_mmproj
test_single_matching_family_mmproj_picked
test_hf_style_unprefixed_mmproj_still_works
test_blocks_single_cross_family_projector (the Unsloth Studio incorrectly auto-selects unrelated mmproj from flat local GGUF directory #5347 core case)
test_picks_matching_family_among_mixed_candidates
test_prefers_longest_prefix_within_same_family
test_unrecognised_family_does_not_break_detection
test_directory_path_returns_first_candidate
test_search_root_walk_still_works

Files changed

studio/backend/utils/models/model_config.py — _MODEL_FAMILY_TOKENS, _detect_family_token, _shared_prefix_len, updated detect_mmproj_file.
studio/backend/tests/test_detect_mmproj_file.py — new.

🤖 Generated with Claude Code

…dirs (unslothai#5347) When a flat local GGUF directory holds several unrelated models with their own mmproj siblings, detect_mmproj_file() returned the first projector it walked into. For the layout reported in unslothai#5347 (Qwen weights + a Gemma mmproj in the same dir) that meant llama-server was launched with --mmproj pointing at the Gemma projector, which fails to load and surfaces as a confusing crash. Disambiguation rules: - Drop candidates whose family token (qwen/gemma/llama/mistral/phi/...) disagrees with the model's family. Candidates with no recognised family token (e.g. the HF-convention 'mmproj-F16.gguf') are kept. - Among same-family candidates, prefer the one whose stem shares the longest prefix with the model (Qwen3.5-9B mmproj beats Qwen3.5-35B mmproj for a Qwen3.5-9B model). - If every candidate is dropped, return None — better than attaching a wrong projector and getting a server-launch failure. Tests cover the cross-family block, multi-candidate prefix tie-break, HF-convention 'mmproj-F16.gguf', unrecognised families, and the existing search_root walk.

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request enhances the detect_mmproj_file function to more accurately identify multi-modal projector files by implementing family-based filtering and longest-prefix matching. These changes prevent the incorrect attachment of projectors in directories containing multiple unrelated models. A new test suite is included to verify the disambiguation logic across various scenarios. The review feedback points out that the current substring-based family detection is prone to false positives for short tokens and recommends using regular expressions with word boundaries for better precision.

gemini-code-assist · 2026-05-09T19:30:58Z

+def _detect_family_token(filename: str) -> Optional[str]:
+    """Return the first known family token found in ``filename``, or None."""
+    name = filename.lower()
+    for token in _MODEL_FAMILY_TOKENS:
+        if token in name:
+            return token
+    return None


The current substring matching ('token in name') is susceptible to false positives, especially for short family tokens like 'phi', 'yi', or 'glm'. For example, 'phi' would match a model named 'graphite-7b.gguf', and 'yi' would match 'playing-7b.gguf'.

Using a regex with word/separator boundaries would make this detection much more robust.

Suggested change

def _detect_family_token(filename: str) -> Optional[str]:

"""Return the first known family token found in ``filename``, or None."""

name = filename.lower()

for token in _MODEL_FAMILY_TOKENS:

if token in name:

return token

return None

def _detect_family_token(filename: str) -> Optional[str]:

"""Return the first known family token found in filename, or None."""

name = filename.lower()

for token in _MODEL_FAMILY_TOKENS:

# Use boundaries to avoid false positives on short tokens (e.g. 'phi' in 'graphite')

pattern = rf"(?:^|[^a-z0-9]){_re.escape(token)}(?:$|[^a-z0-9])"

if _re.search(pattern, name):

return token

return None

References

Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

…er guard Tighten the family-token detector to match only on word boundaries so substring collisions stop tagging false families: phi no longer matches sapphire, yi no longer matches yip, mimo no longer matches mimosa, and mistral does not bleed into ministral/magistral/devstral. Pick the token whose first occurrence is leftmost in the filename rather than the first hit in tuple order, so merge models disambiguate predictably (llama-phi tags llama; phi-llama tags phi). Expand _MODEL_FAMILY_TOKENS with the families an audit of the unsloth HF org turned up that the previous list missed: devstral, ministral, magistral (Mistral-derivative naming), nemotron, kimi, nanonets, cosmos, mimo, apriel, lfm. Without these, a flat local GGUF directory containing one of these weights plus an unrelated renamed projector still hit the original unslothai#5347 failure. Add mmproj_matches_model_family() and call it at the llama-server launch site in core/inference/llama_cpp.py. detect_mmproj_file already drops cross-family candidates at discovery time, but mmproj_path can also reach the launcher via config injection or future overrides; this guard keeps those paths from silently loading a known-wrong projector. Tests: 12 new cases covering substring rejection, leftmost-position selection, new family tokens, a new flat-dir Nemotron + Gemma rejection case, and the launcher-level guard. All 21 detect_mmproj_file tests and the existing 106 llama_cpp tests pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21ddfbae50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T12:30:00Z

+    best = max(
+        family_filtered,
+        key = lambda c: (_shared_prefix_len(model_stem, c.stem.lower()), -len(c.stem)),


Handle mmproj-prefixed candidates in prefix ranking

When same-family projectors are named with the common mmproj-<model>-... prefix, _shared_prefix_len(model_stem, c.stem.lower()) is 0 for every candidate because the candidate stem starts with mmproj, so the tie-breaker falls back to the shortest projector name. In a flat dir with Qwen2-VL-72B-...gguf, mmproj-Qwen2-VL-72B-...gguf, and mmproj-Qwen2-VL-7B-...gguf, this can select the 7B projector even if the 72B projector was first and would have been returned before; consider stripping the mmproj prefix or otherwise comparing against the model-bearing part of the projector name.

Useful? React with 👍 / 👎.

for more information, see https://pre-commit.ci

Real Unsloth vision GGUFs carry rich identity metadata that has been ignored by the discovery path. Every projector under the unsloth org has general.type='mmproj' plus general.base_model.0.repo_url pointing at the same upstream HF repo as its weight, and the equivalent basename, base_model.0.name, and base_model.0.organization fields. A flat-dir mismatch is therefore decidable from the headers alone, no matter how the user has renamed the files. Add utils/models/gguf_metadata.py with read_gguf_general_metadata(): a fast (~30 ms) header walk that pulls only the general.* string fields and skips everything else, cached by (resolved path, mtime_ns, size). Mirrors the parser shape already used by LlamaCppBackend._read_gguf_metadata so the format handling is consistent. is_mmproj_by_metadata() returns True/False/None from general.type, and pairing_score() returns 100 for an exact base_model URL match, 80 for basename plus organization match, 60 for basename only, -1 for definitive metadata disagreement, and 0 when neither side has enough metadata to decide. Rewire detect_mmproj_file() to a two-stage selector: 1. Detect projectors via metadata (general.type) when present, else fall back to the filename substring heuristic. This recovers headerless projectors AND projectors whose name does not contain 'mmproj' but whose header advertises one. 2. Score each candidate against the weight via pairing_score. Drop candidates with score -1 (definitive metadata disagreement). For candidates with score 0 (no usable metadata) fall back to the existing filename family-token check, dropping recognised-family mismatches. Pick the survivor with the highest (score, longest_prefix, -len(stem)) tuple, so a metadata URL match always wins over a filename-prefix match. Tests: 16 new cases. tests/test_gguf_metadata.py covers the parser (missing file, non-GGUF, string extraction, walking past arrays and uint32s, cache invalidation by mtime/size) and the score helpers. tests/test_detect_mmproj_file.py adds end-to-end cases that synthesise real on-disk GGUF headers: URL match wins over a longer-prefix sibling, URL mismatch returns None even when filenames match, a projector named 'vision-projector.gguf' is still discovered via general.type, and a 100-score header match outranks a near-perfect filename prefix on a headerless candidate. All 75 tests across detect_mmproj_file, gguf_metadata, llama_cpp load progress, cached gguf routes, trained model scan, and vision cache pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bfba3d1ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T23:33:00Z

+    if w_base and p_base and w_org and p_org:
+        if w_base.lower() == p_base.lower() and w_org.lower() == p_org.lower():
+            return 80


Reject mismatched size labels before basename fallback

When GGUFs have general.basename/organization but lack general.base_model.0.repo_url, this treats a projector with the same family basename as a confident match even if general.size_label differs. For families that publish multiple sizes under the same basename (for example a 7B and 72B Qwen/VL projector with generic mmproj-F16.gguf names), a flat directory can still attach the wrong projector instead of dropping it, which is the failure mode this change is trying to prevent.

Useful? React with 👍 / 👎.

changes Trim verbose explanations to one-line statements of intent. The behaviour is unchanged: 161 tests across detect_mmproj_file, gguf_metadata, llama_cpp_load_progress (+ matrix), llama_server_args, llama_cpp_cache_aware_disk_check, trained_model_scan, and vision_cache all pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc6fd0cd62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T02:12:32Z

+    t = meta.get("general.type")
+    if t is None:
+        return None
+    return t.lower() == "mmproj"


Allow clip-vision projectors through metadata check

For mmproj GGUFs whose header uses a projector-specific general.type such as clip-vision, this returns False, and detect_mmproj_file then disables the filename fallback even if the file is named mmproj-*.gguf. In that layout a valid local vision companion is never added to candidates, so the model is imported/served as text-only despite the mmproj sibling being present; consider only treating known primary model types as definitive negatives, or accepting projector types/general.architecture == clip here.

Useful? React with 👍 / 👎.

Trim the docstring and the dir-walking block comments inside detect_mmproj_file to one-liners. Behaviour unchanged; 44 mmproj + gguf_metadata + llama_cpp_load_progress tests pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 910ebd482b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T02:58:01Z

+            by_meta = is_mmproj_by_metadata(meta)
+            if by_meta is True or (by_meta is None and _is_mmproj(f.name)):
+                seen_resolved.add(resolved)
+                candidates.append(resolved)


Keep symlink names when ranking projector candidates

For symlinked projector files, appending the resolved target here makes the later family filter and prefix tie-break use the target basename instead of the user-visible mmproj filename. In flat local dirs that use readable symlinks to opaque blobs, e.g. Qwen...gguf plus both Qwen...mmproj.gguf and gemma...mmproj.gguf pointing at digest-named files, _detect_family_token(c.name) sees no family for either candidate and the new cross-family guard/prefix ranking can still pick the wrong projector. Keep the original path for filename matching and only use the resolved path for dedup/returning.

Useful? React with 👍 / 👎.

The eviction branch popped exactly one entry when len >= max, so the cache size could only converge to the cap when entries were added slowly enough for natural growth. After a sandbox sim that reduced the cap mid-run, len stayed above the cap because each insert popped one and added one. Switch to a while loop so we evict until len is strictly below the cap before inserting. Steady-state behaviour at the default 4096 ceiling is unchanged.

danielhanchen · 2026-05-15T03:30:43Z

Thanks for the PR, appreciate it!

Anai-Guo requested review from danielhanchen and rolandtannous as code owners May 9, 2026 19:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

80220df

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

danielhanchen and others added 3 commits May 14, 2026 05:30

Merge branch 'main' into fix-mmproj-cross-family-5347

2b2dcc6

[pre-commit.ci] auto fixes from pre-commit.com hooks

f938ac0

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

studio/mmproj: shorten remaining detect_mmproj_file body comments

910ebd4

Trim the docstring and the dir-walking block comments inside detect_mmproj_file to one-liners. Behaviour unchanged; 44 mmproj + gguf_metadata + llama_cpp_load_progress tests pass.

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

danielhanchen merged commit 63c6750 into unslothai:main May 15, 2026
21 of 27 checks passed

danielhanchen mentioned this pull request May 15, 2026

studio/mmproj: skip unwanted GGUF values via seek instead of read #5431

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347)#5350

fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347)#5350
danielhanchen merged 9 commits into
unslothai:mainfrom
Anai-Guo:fix-mmproj-cross-family-5347

Anai-Guo commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

danielhanchen commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Anai-Guo commented May 9, 2026

Summary

Approach

Backwards compatibility

Tests

Files changed

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants