Skip to content

fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347)#5350

Merged
danielhanchen merged 9 commits into
unslothai:mainfrom
Anai-Guo:fix-mmproj-cross-family-5347
May 15, 2026
Merged

fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347)#5350
danielhanchen merged 9 commits into
unslothai:mainfrom
Anai-Guo:fix-mmproj-cross-family-5347

Conversation

@Anai-Guo

@Anai-Guo Anai-Guo commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #5347.

When a flat local GGUF directory holds several unrelated models with their own mmproj siblings, detect_mmproj_file() walked the directory and returned the first projector it found. For the layout in #5347:

E:\llm_models\
├── Qwen3.5-9B-Q4_K_M.gguf       ← user picks this
├── Qwen3.5-9B-BF16-mmproj.gguf  ← correct projector
└── gemma-4-26B-A4B-it.mmproj-q8_0.gguf

llama-server was launched with --mmproj /workspace/llm-models/gemma-4-26B-A4B-it.mmproj-q8_0.gguf, which fails to load and surfaces as a confusing crash that makes valid local GGUFs look broken.

Approach

Two-stage disambiguation, applied only when path is a .gguf file (we know the target model):

  1. Family-token filter. A projector whose filename family token (qwen, gemma, llama, mistral, phi, deepseek, internvl, minicpm, llava, glm, yi, command-r, molmo, pixtral, smolvlm, moondream, granite, ovis) disagrees with the target model's family is dropped. Candidates whose family is not recognised (e.g. the HF-convention mmproj-F16.gguf) are kept — they could legitimately belong to any model.
  2. Longest-stem-prefix tie-break. Among same-family candidates, prefer the one whose stem shares the longest prefix with the model (e.g. for Qwen3.5-35B-A3B-UD-Q4_K_L.gguf, Qwen3.5-35B-A3B-BF16-mmproj.gguf beats Qwen3.5-9B-BF16-mmproj.gguf).

If every candidate is dropped by rule (1), return None rather than fall back to an arbitrary mismatched projector — better to launch without --mmproj than with the wrong one.

When path is a directory (no model name to compare against) the historical "first candidate" behaviour is preserved.

Backwards compatibility

  • Single same-family projector → unchanged.
  • HF-style mmproj-F16.gguf next to a single weight → unchanged (kept by the no-recognised-family clause).
  • The search_root walk for snapshot layouts (snapshot/BF16/foo.gguf + snapshot/mmproj-BF16.gguf) is preserved.
  • Directory-as-path callers → unchanged.

Tests

studio/backend/tests/test_detect_mmproj_file.py covers:

  • test_returns_none_when_no_mmproj
  • test_single_matching_family_mmproj_picked
  • test_hf_style_unprefixed_mmproj_still_works
  • test_blocks_single_cross_family_projector (the Unsloth Studio incorrectly auto-selects unrelated mmproj from flat local GGUF directory #5347 core case)
  • test_picks_matching_family_among_mixed_candidates
  • test_prefers_longest_prefix_within_same_family
  • test_unrecognised_family_does_not_break_detection
  • test_directory_path_returns_first_candidate
  • test_search_root_walk_still_works

Files changed

  • studio/backend/utils/models/model_config.py_MODEL_FAMILY_TOKENS, _detect_family_token, _shared_prefix_len, updated detect_mmproj_file.
  • studio/backend/tests/test_detect_mmproj_file.py — new.

🤖 Generated with Claude Code

…dirs (unslothai#5347)

When a flat local GGUF directory holds several unrelated models with their
own mmproj siblings, detect_mmproj_file() returned the first projector it
walked into. For the layout reported in unslothai#5347 (Qwen weights + a Gemma
mmproj in the same dir) that meant llama-server was launched with
--mmproj pointing at the Gemma projector, which fails to load and surfaces
as a confusing crash.

Disambiguation rules:
- Drop candidates whose family token (qwen/gemma/llama/mistral/phi/...)
  disagrees with the model's family. Candidates with no recognised
  family token (e.g. the HF-convention 'mmproj-F16.gguf') are kept.
- Among same-family candidates, prefer the one whose stem shares the
  longest prefix with the model (Qwen3.5-9B mmproj beats Qwen3.5-35B
  mmproj for a Qwen3.5-9B model).
- If every candidate is dropped, return None — better than attaching
  a wrong projector and getting a server-launch failure.

Tests cover the cross-family block, multi-candidate prefix tie-break,
HF-convention 'mmproj-F16.gguf', unrecognised families, and the
existing search_root walk.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the detect_mmproj_file function to more accurately identify multi-modal projector files by implementing family-based filtering and longest-prefix matching. These changes prevent the incorrect attachment of projectors in directories containing multiple unrelated models. A new test suite is included to verify the disambiguation logic across various scenarios. The review feedback points out that the current substring-based family detection is prone to false positives for short tokens and recommends using regular expressions with word boundaries for better precision.

Comment on lines +941 to +947
def _detect_family_token(filename: str) -> Optional[str]:
"""Return the first known family token found in ``filename``, or None."""
name = filename.lower()
for token in _MODEL_FAMILY_TOKENS:
if token in name:
return token
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current substring matching ('token in name') is susceptible to false positives, especially for short family tokens like 'phi', 'yi', or 'glm'. For example, 'phi' would match a model named 'graphite-7b.gguf', and 'yi' would match 'playing-7b.gguf'.

Using a regex with word/separator boundaries would make this detection much more robust.

Suggested change
def _detect_family_token(filename: str) -> Optional[str]:
"""Return the first known family token found in ``filename``, or None."""
name = filename.lower()
for token in _MODEL_FAMILY_TOKENS:
if token in name:
return token
return None
def _detect_family_token(filename: str) -> Optional[str]:
"""Return the first known family token found in filename, or None."""
name = filename.lower()
for token in _MODEL_FAMILY_TOKENS:
# Use boundaries to avoid false positives on short tokens (e.g. 'phi' in 'graphite')
pattern = rf"(?:^|[^a-z0-9]){_re.escape(token)}(?:$|[^a-z0-9])"
if _re.search(pattern, name):
return token
return None
References
  1. Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

…er guard

Tighten the family-token detector to match only on word boundaries so
substring collisions stop tagging false families: phi no longer matches
sapphire, yi no longer matches yip, mimo no longer matches mimosa, and
mistral does not bleed into ministral/magistral/devstral. Pick the token
whose first occurrence is leftmost in the filename rather than the first
hit in tuple order, so merge models disambiguate predictably (llama-phi
tags llama; phi-llama tags phi).

Expand _MODEL_FAMILY_TOKENS with the families an audit of the unsloth
HF org turned up that the previous list missed: devstral, ministral,
magistral (Mistral-derivative naming), nemotron, kimi, nanonets, cosmos,
mimo, apriel, lfm. Without these, a flat local GGUF directory containing
one of these weights plus an unrelated renamed projector still hit the
original unslothai#5347 failure.

Add mmproj_matches_model_family() and call it at the llama-server launch
site in core/inference/llama_cpp.py. detect_mmproj_file already drops
cross-family candidates at discovery time, but mmproj_path can also reach
the launcher via config injection or future overrides; this guard keeps
those paths from silently loading a known-wrong projector.

Tests: 12 new cases covering substring rejection, leftmost-position
selection, new family tokens, a new flat-dir Nemotron + Gemma rejection
case, and the launcher-level guard. All 21 detect_mmproj_file tests and
the existing 106 llama_cpp tests pass.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21ddfbae50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1156 to +1158
best = max(
family_filtered,
key = lambda c: (_shared_prefix_len(model_stem, c.stem.lower()), -len(c.stem)),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle mmproj-prefixed candidates in prefix ranking

When same-family projectors are named with the common mmproj-<model>-... prefix, _shared_prefix_len(model_stem, c.stem.lower()) is 0 for every candidate because the candidate stem starts with mmproj, so the tie-breaker falls back to the shortest projector name. In a flat dir with Qwen2-VL-72B-...gguf, mmproj-Qwen2-VL-72B-...gguf, and mmproj-Qwen2-VL-7B-...gguf, this can select the 7B projector even if the 72B projector was first and would have been returned before; consider stripping the mmproj prefix or otherwise comparing against the model-bearing part of the projector name.

Useful? React with 👍 / 👎.

danielhanchen and others added 3 commits May 14, 2026 05:30
Real Unsloth vision GGUFs carry rich identity metadata that has been
ignored by the discovery path. Every projector under the unsloth org
has general.type='mmproj' plus general.base_model.0.repo_url pointing
at the same upstream HF repo as its weight, and the equivalent
basename, base_model.0.name, and base_model.0.organization fields. A
flat-dir mismatch is therefore decidable from the headers alone, no
matter how the user has renamed the files.

Add utils/models/gguf_metadata.py with read_gguf_general_metadata():
a fast (~30 ms) header walk that pulls only the general.* string
fields and skips everything else, cached by (resolved path, mtime_ns,
size). Mirrors the parser shape already used by
LlamaCppBackend._read_gguf_metadata so the format handling is
consistent.

is_mmproj_by_metadata() returns True/False/None from general.type,
and pairing_score() returns 100 for an exact base_model URL match,
80 for basename plus organization match, 60 for basename only, -1
for definitive metadata disagreement, and 0 when neither side has
enough metadata to decide.

Rewire detect_mmproj_file() to a two-stage selector:
  1. Detect projectors via metadata (general.type) when present, else
     fall back to the filename substring heuristic. This recovers
     headerless projectors AND projectors whose name does not contain
     'mmproj' but whose header advertises one.
  2. Score each candidate against the weight via pairing_score. Drop
     candidates with score -1 (definitive metadata disagreement). For
     candidates with score 0 (no usable metadata) fall back to the
     existing filename family-token check, dropping recognised-family
     mismatches. Pick the survivor with the highest (score,
     longest_prefix, -len(stem)) tuple, so a metadata URL match
     always wins over a filename-prefix match.

Tests: 16 new cases. tests/test_gguf_metadata.py covers the parser
(missing file, non-GGUF, string extraction, walking past arrays and
uint32s, cache invalidation by mtime/size) and the score helpers.
tests/test_detect_mmproj_file.py adds end-to-end cases that synthesise
real on-disk GGUF headers: URL match wins over a longer-prefix
sibling, URL mismatch returns None even when filenames match, a
projector named 'vision-projector.gguf' is still discovered via
general.type, and a 100-score header match outranks a near-perfect
filename prefix on a headerless candidate.

All 75 tests across detect_mmproj_file, gguf_metadata, llama_cpp
load progress, cached gguf routes, trained model scan, and vision
cache pass.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bfba3d1ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +270 to +272
if w_base and p_base and w_org and p_org:
if w_base.lower() == p_base.lower() and w_org.lower() == p_org.lower():
return 80

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject mismatched size labels before basename fallback

When GGUFs have general.basename/organization but lack general.base_model.0.repo_url, this treats a projector with the same family basename as a confident match even if general.size_label differs. For families that publish multiple sizes under the same basename (for example a 7B and 72B Qwen/VL projector with generic mmproj-F16.gguf names), a flat directory can still attach the wrong projector instead of dropping it, which is the failure mode this change is trying to prevent.

Useful? React with 👍 / 👎.

 changes

Trim verbose explanations to one-line statements of intent. The
behaviour is unchanged: 161 tests across detect_mmproj_file,
gguf_metadata, llama_cpp_load_progress (+ matrix), llama_server_args,
llama_cpp_cache_aware_disk_check, trained_model_scan, and vision_cache
all pass.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc6fd0cd62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

t = meta.get("general.type")
if t is None:
return None
return t.lower() == "mmproj"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow clip-vision projectors through metadata check

For mmproj GGUFs whose header uses a projector-specific general.type such as clip-vision, this returns False, and detect_mmproj_file then disables the filename fallback even if the file is named mmproj-*.gguf. In that layout a valid local vision companion is never added to candidates, so the model is imported/served as text-only despite the mmproj sibling being present; consider only treating known primary model types as definitive negatives, or accepting projector types/general.architecture == clip here.

Useful? React with 👍 / 👎.

Trim the docstring and the dir-walking block comments inside
detect_mmproj_file to one-liners. Behaviour unchanged; 44 mmproj +
gguf_metadata + llama_cpp_load_progress tests pass.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 910ebd482b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

by_meta = is_mmproj_by_metadata(meta)
if by_meta is True or (by_meta is None and _is_mmproj(f.name)):
seen_resolved.add(resolved)
candidates.append(resolved)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep symlink names when ranking projector candidates

For symlinked projector files, appending the resolved target here makes the later family filter and prefix tie-break use the target basename instead of the user-visible mmproj filename. In flat local dirs that use readable symlinks to opaque blobs, e.g. Qwen...gguf plus both Qwen...mmproj.gguf and gemma...mmproj.gguf pointing at digest-named files, _detect_family_token(c.name) sees no family for either candidate and the new cross-family guard/prefix ranking can still pick the wrong projector. Keep the original path for filename matching and only use the resolved path for dedup/returning.

Useful? React with 👍 / 👎.

The eviction branch popped exactly one entry when len >= max, so the
cache size could only converge to the cap when entries were added
slowly enough for natural growth. After a sandbox sim that reduced
the cap mid-run, len stayed above the cap because each insert popped
one and added one. Switch to a while loop so we evict until len is
strictly below the cap before inserting. Steady-state behaviour at
the default 4096 ceiling is unchanged.
@danielhanchen

Copy link
Copy Markdown
Member

Thanks for the PR, appreciate it!

@danielhanchen danielhanchen merged commit 63c6750 into unslothai:main May 15, 2026
21 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unsloth Studio incorrectly auto-selects unrelated mmproj from flat local GGUF directory

2 participants