fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347)#5350
Conversation
…dirs (unslothai#5347) When a flat local GGUF directory holds several unrelated models with their own mmproj siblings, detect_mmproj_file() returned the first projector it walked into. For the layout reported in unslothai#5347 (Qwen weights + a Gemma mmproj in the same dir) that meant llama-server was launched with --mmproj pointing at the Gemma projector, which fails to load and surfaces as a confusing crash. Disambiguation rules: - Drop candidates whose family token (qwen/gemma/llama/mistral/phi/...) disagrees with the model's family. Candidates with no recognised family token (e.g. the HF-convention 'mmproj-F16.gguf') are kept. - Among same-family candidates, prefer the one whose stem shares the longest prefix with the model (Qwen3.5-9B mmproj beats Qwen3.5-35B mmproj for a Qwen3.5-9B model). - If every candidate is dropped, return None — better than attaching a wrong projector and getting a server-launch failure. Tests cover the cross-family block, multi-candidate prefix tie-break, HF-convention 'mmproj-F16.gguf', unrecognised families, and the existing search_root walk.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request enhances the detect_mmproj_file function to more accurately identify multi-modal projector files by implementing family-based filtering and longest-prefix matching. These changes prevent the incorrect attachment of projectors in directories containing multiple unrelated models. A new test suite is included to verify the disambiguation logic across various scenarios. The review feedback points out that the current substring-based family detection is prone to false positives for short tokens and recommends using regular expressions with word boundaries for better precision.
| def _detect_family_token(filename: str) -> Optional[str]: | ||
| """Return the first known family token found in ``filename``, or None.""" | ||
| name = filename.lower() | ||
| for token in _MODEL_FAMILY_TOKENS: | ||
| if token in name: | ||
| return token | ||
| return None |
There was a problem hiding this comment.
The current substring matching ('token in name') is susceptible to false positives, especially for short family tokens like 'phi', 'yi', or 'glm'. For example, 'phi' would match a model named 'graphite-7b.gguf', and 'yi' would match 'playing-7b.gguf'.
Using a regex with word/separator boundaries would make this detection much more robust.
| def _detect_family_token(filename: str) -> Optional[str]: | |
| """Return the first known family token found in ``filename``, or None.""" | |
| name = filename.lower() | |
| for token in _MODEL_FAMILY_TOKENS: | |
| if token in name: | |
| return token | |
| return None | |
| def _detect_family_token(filename: str) -> Optional[str]: | |
| """Return the first known family token found in filename, or None.""" | |
| name = filename.lower() | |
| for token in _MODEL_FAMILY_TOKENS: | |
| # Use boundaries to avoid false positives on short tokens (e.g. 'phi' in 'graphite') | |
| pattern = rf"(?:^|[^a-z0-9]){_re.escape(token)}(?:$|[^a-z0-9])" | |
| if _re.search(pattern, name): | |
| return token | |
| return None |
References
- Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.
…er guard Tighten the family-token detector to match only on word boundaries so substring collisions stop tagging false families: phi no longer matches sapphire, yi no longer matches yip, mimo no longer matches mimosa, and mistral does not bleed into ministral/magistral/devstral. Pick the token whose first occurrence is leftmost in the filename rather than the first hit in tuple order, so merge models disambiguate predictably (llama-phi tags llama; phi-llama tags phi). Expand _MODEL_FAMILY_TOKENS with the families an audit of the unsloth HF org turned up that the previous list missed: devstral, ministral, magistral (Mistral-derivative naming), nemotron, kimi, nanonets, cosmos, mimo, apriel, lfm. Without these, a flat local GGUF directory containing one of these weights plus an unrelated renamed projector still hit the original unslothai#5347 failure. Add mmproj_matches_model_family() and call it at the llama-server launch site in core/inference/llama_cpp.py. detect_mmproj_file already drops cross-family candidates at discovery time, but mmproj_path can also reach the launcher via config injection or future overrides; this guard keeps those paths from silently loading a known-wrong projector. Tests: 12 new cases covering substring rejection, leftmost-position selection, new family tokens, a new flat-dir Nemotron + Gemma rejection case, and the launcher-level guard. All 21 detect_mmproj_file tests and the existing 106 llama_cpp tests pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21ddfbae50
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| best = max( | ||
| family_filtered, | ||
| key = lambda c: (_shared_prefix_len(model_stem, c.stem.lower()), -len(c.stem)), |
There was a problem hiding this comment.
Handle mmproj-prefixed candidates in prefix ranking
When same-family projectors are named with the common mmproj-<model>-... prefix, _shared_prefix_len(model_stem, c.stem.lower()) is 0 for every candidate because the candidate stem starts with mmproj, so the tie-breaker falls back to the shortest projector name. In a flat dir with Qwen2-VL-72B-...gguf, mmproj-Qwen2-VL-72B-...gguf, and mmproj-Qwen2-VL-7B-...gguf, this can select the 7B projector even if the 72B projector was first and would have been returned before; consider stripping the mmproj prefix or otherwise comparing against the model-bearing part of the projector name.
Useful? React with 👍 / 👎.
for more information, see https://pre-commit.ci
Real Unsloth vision GGUFs carry rich identity metadata that has been
ignored by the discovery path. Every projector under the unsloth org
has general.type='mmproj' plus general.base_model.0.repo_url pointing
at the same upstream HF repo as its weight, and the equivalent
basename, base_model.0.name, and base_model.0.organization fields. A
flat-dir mismatch is therefore decidable from the headers alone, no
matter how the user has renamed the files.
Add utils/models/gguf_metadata.py with read_gguf_general_metadata():
a fast (~30 ms) header walk that pulls only the general.* string
fields and skips everything else, cached by (resolved path, mtime_ns,
size). Mirrors the parser shape already used by
LlamaCppBackend._read_gguf_metadata so the format handling is
consistent.
is_mmproj_by_metadata() returns True/False/None from general.type,
and pairing_score() returns 100 for an exact base_model URL match,
80 for basename plus organization match, 60 for basename only, -1
for definitive metadata disagreement, and 0 when neither side has
enough metadata to decide.
Rewire detect_mmproj_file() to a two-stage selector:
1. Detect projectors via metadata (general.type) when present, else
fall back to the filename substring heuristic. This recovers
headerless projectors AND projectors whose name does not contain
'mmproj' but whose header advertises one.
2. Score each candidate against the weight via pairing_score. Drop
candidates with score -1 (definitive metadata disagreement). For
candidates with score 0 (no usable metadata) fall back to the
existing filename family-token check, dropping recognised-family
mismatches. Pick the survivor with the highest (score,
longest_prefix, -len(stem)) tuple, so a metadata URL match
always wins over a filename-prefix match.
Tests: 16 new cases. tests/test_gguf_metadata.py covers the parser
(missing file, non-GGUF, string extraction, walking past arrays and
uint32s, cache invalidation by mtime/size) and the score helpers.
tests/test_detect_mmproj_file.py adds end-to-end cases that synthesise
real on-disk GGUF headers: URL match wins over a longer-prefix
sibling, URL mismatch returns None even when filenames match, a
projector named 'vision-projector.gguf' is still discovered via
general.type, and a 100-score header match outranks a near-perfect
filename prefix on a headerless candidate.
All 75 tests across detect_mmproj_file, gguf_metadata, llama_cpp
load progress, cached gguf routes, trained model scan, and vision
cache pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bfba3d1ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if w_base and p_base and w_org and p_org: | ||
| if w_base.lower() == p_base.lower() and w_org.lower() == p_org.lower(): | ||
| return 80 |
There was a problem hiding this comment.
Reject mismatched size labels before basename fallback
When GGUFs have general.basename/organization but lack general.base_model.0.repo_url, this treats a projector with the same family basename as a confident match even if general.size_label differs. For families that publish multiple sizes under the same basename (for example a 7B and 72B Qwen/VL projector with generic mmproj-F16.gguf names), a flat directory can still attach the wrong projector instead of dropping it, which is the failure mode this change is trying to prevent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc6fd0cd62
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| t = meta.get("general.type") | ||
| if t is None: | ||
| return None | ||
| return t.lower() == "mmproj" |
There was a problem hiding this comment.
Allow clip-vision projectors through metadata check
For mmproj GGUFs whose header uses a projector-specific general.type such as clip-vision, this returns False, and detect_mmproj_file then disables the filename fallback even if the file is named mmproj-*.gguf. In that layout a valid local vision companion is never added to candidates, so the model is imported/served as text-only despite the mmproj sibling being present; consider only treating known primary model types as definitive negatives, or accepting projector types/general.architecture == clip here.
Useful? React with 👍 / 👎.
Trim the docstring and the dir-walking block comments inside detect_mmproj_file to one-liners. Behaviour unchanged; 44 mmproj + gguf_metadata + llama_cpp_load_progress tests pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 910ebd482b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| by_meta = is_mmproj_by_metadata(meta) | ||
| if by_meta is True or (by_meta is None and _is_mmproj(f.name)): | ||
| seen_resolved.add(resolved) | ||
| candidates.append(resolved) |
There was a problem hiding this comment.
Keep symlink names when ranking projector candidates
For symlinked projector files, appending the resolved target here makes the later family filter and prefix tie-break use the target basename instead of the user-visible mmproj filename. In flat local dirs that use readable symlinks to opaque blobs, e.g. Qwen...gguf plus both Qwen...mmproj.gguf and gemma...mmproj.gguf pointing at digest-named files, _detect_family_token(c.name) sees no family for either candidate and the new cross-family guard/prefix ranking can still pick the wrong projector. Keep the original path for filename matching and only use the resolved path for dedup/returning.
Useful? React with 👍 / 👎.
The eviction branch popped exactly one entry when len >= max, so the cache size could only converge to the cap when entries were added slowly enough for natural growth. After a sandbox sim that reduced the cap mid-run, len stayed above the cap because each insert popped one and added one. Switch to a while loop so we evict until len is strictly below the cap before inserting. Steady-state behaviour at the default 4096 ceiling is unchanged.
|
Thanks for the PR, appreciate it! |
Summary
Fixes #5347.
When a flat local GGUF directory holds several unrelated models with their own
mmprojsiblings,detect_mmproj_file()walked the directory and returned the first projector it found. For the layout in #5347:llama-serverwas launched with--mmproj /workspace/llm-models/gemma-4-26B-A4B-it.mmproj-q8_0.gguf, which fails to load and surfaces as a confusing crash that makes valid local GGUFs look broken.Approach
Two-stage disambiguation, applied only when
pathis a.gguffile (we know the target model):qwen,gemma,llama,mistral,phi,deepseek,internvl,minicpm,llava,glm,yi,command-r,molmo,pixtral,smolvlm,moondream,granite,ovis) disagrees with the target model's family is dropped. Candidates whose family is not recognised (e.g. the HF-conventionmmproj-F16.gguf) are kept — they could legitimately belong to any model.Qwen3.5-35B-A3B-UD-Q4_K_L.gguf,Qwen3.5-35B-A3B-BF16-mmproj.ggufbeatsQwen3.5-9B-BF16-mmproj.gguf).If every candidate is dropped by rule (1), return
Nonerather than fall back to an arbitrary mismatched projector — better to launch without--mmprojthan with the wrong one.When
pathis a directory (no model name to compare against) the historical "first candidate" behaviour is preserved.Backwards compatibility
mmproj-F16.ggufnext to a single weight → unchanged (kept by the no-recognised-family clause).search_rootwalk for snapshot layouts (snapshot/BF16/foo.gguf+snapshot/mmproj-BF16.gguf) is preserved.Tests
studio/backend/tests/test_detect_mmproj_file.pycovers:test_returns_none_when_no_mmprojtest_single_matching_family_mmproj_pickedtest_hf_style_unprefixed_mmproj_still_workstest_blocks_single_cross_family_projector(the Unsloth Studio incorrectly auto-selects unrelated mmproj from flat local GGUF directory #5347 core case)test_picks_matching_family_among_mixed_candidatestest_prefers_longest_prefix_within_same_familytest_unrecognised_family_does_not_break_detectiontest_directory_path_returns_first_candidatetest_search_root_walk_still_worksFiles changed
studio/backend/utils/models/model_config.py—_MODEL_FAMILY_TOKENS,_detect_family_token,_shared_prefix_len, updateddetect_mmproj_file.studio/backend/tests/test_detect_mmproj_file.py— new.🤖 Generated with Claude Code