[Studio][Optimization]Add vision detection cache to is_vision_model()#4853
Conversation
…bprocess spawns is_vision_model() is called 4-5 times per training run for the same model with zero caching. For transformers 5.x models, each call spawns a full subprocess (~6s each). This adds a module-level _vision_detection_cache dict following the same pattern as the existing _audio_detection_cache used by detect_audio_type(). The function is refactored into a thin cache wrapper around _is_vision_model_uncached(), saving ~12s per training run.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a00536ca04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if model_name in _vision_detection_cache: | ||
| return _vision_detection_cache[model_name] | ||
|
|
||
| result = _is_vision_model_uncached(model_name, hf_token) | ||
| _vision_detection_cache[model_name] = result |
There was a problem hiding this comment.
Include auth context in vision cache key
Caching only by model_name makes is_vision_model return stale/incorrect results when auth differs between calls. A common path is an unauthenticated probe that falls back to False (for gated/private models), then a later authenticated call still returns that cached False without re-checking; this is now process-wide because /config calls is_vision_model(model_name) without passing hf_token (studio/backend/routes/models.py), while other flows call is_vision_model(..., hf_token=...) (for example ModelConfig.from_identifier). That can misclassify gated vision models and disable vision-specific behavior until restart.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a caching mechanism for vision model detection in is_vision_model to prevent redundant subprocess spawns and HuggingFace API calls. A new test suite is also added to verify the caching behavior. Feedback focuses on improving the cache key by including the hf_token to correctly handle gated models and normalizing local paths to ensure cache consistency. Additionally, the tests should be updated to reflect these changes in the caching logic.
| # Cache vision detection results per session to avoid repeated subprocess spawns | ||
| _vision_detection_cache: Dict[str, bool] = {} |
There was a problem hiding this comment.
The cache key should include the hf_token to correctly handle gated models. If a check is first performed without a token (or with an invalid one) and fails, caching the False result under just the model_name will prevent subsequent attempts with a valid token from succeeding. This pattern is already used in is_embedding_model (line 1165). Note that updating this will require corresponding changes in the test suite to use tuple keys.
| # Cache vision detection results per session to avoid repeated subprocess spawns | |
| _vision_detection_cache: Dict[str, bool] = {} | |
| # Cache vision detection results per session to avoid repeated subprocess spawns. | |
| # Keyed by (model_name, hf_token) to handle gated models correctly. | |
| _vision_detection_cache: Dict[Tuple[str, Optional[str]], bool] = {} |
References
- A simple dictionary cache without locks is acceptable for idempotent operations where the risk of benign race conditions and negligible memory growth is acceptable.
| if model_name in _vision_detection_cache: | ||
| return _vision_detection_cache[model_name] | ||
|
|
||
| result = _is_vision_model_uncached(model_name, hf_token) | ||
| _vision_detection_cache[model_name] = result | ||
| return result |
There was a problem hiding this comment.
Incorporate the hf_token into the cache key and normalize local paths to ensure cache consistency and correctness. This avoids the issue where a failed initial check (e.g., due to missing authentication) permanently caches a False result for a gated model, and ensures that equivalent local paths (e.g., with or without trailing slashes) hit the same cache entry.
# Normalize local paths to ensure cache hits for equivalent paths
normalized_name = normalize_path(model_name) if is_local_path(model_name) else model_name
cache_key = (normalized_name, hf_token)
if cache_key in _vision_detection_cache:
return _vision_detection_cache[cache_key]
result = _is_vision_model_uncached(normalized_name, hf_token)
_vision_detection_cache[cache_key] = result
return resultReferences
- A simple dictionary cache without locks is acceptable for idempotent operations where the risk of benign race conditions and negligible memory growth is acceptable.
Cache key is now (model_name, hf_token) instead of just model_name. This prevents stale False results when an unauthenticated probe for a gated model is followed by an authenticated call.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0478d78790
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return _vision_detection_cache[cache_key] | ||
|
|
||
| result = _is_vision_model_uncached(model_name, hf_token) | ||
| _vision_detection_cache[cache_key] = result |
There was a problem hiding this comment.
Avoid caching fallback False after detection errors
is_vision_model now caches every returned value, but _is_vision_model_uncached collapses transient failures (HF API errors, subprocess timeout, etc.) to False; this means one temporary outage permanently classifies that model as non-vision for the rest of the process and can disable vision training/inference paths until restart. This regression is introduced by caching the first result unconditionally, so failures should not be cached the same way as successful detections.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efacb85dbd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| model_name: Model identifier (HF repo or local path) | ||
| hf_token: Optional HF token for accessing gated/private models | ||
| """ | ||
| cache_key = (model_name, hf_token) |
There was a problem hiding this comment.
Key vision cache by effective auth context
is_vision_model caches by (model_name, hf_token) only, but _is_vision_model_uncached uses load_model_config(..., use_auth=True, token=hf_token), so hf_token=None still depends on mutable default auth (e.g., HF_TOKEN/cached login). In flows that first probe without auth and later set auth in-process, token-less call sites (e.g., trainer/model routes that call is_vision_model(model_name) without passing hf_token) will keep hitting the stale (model_name, None) entry and can keep gated vision models classified as non-vision until restart.
Useful? React with 👍 / 👎.
…ilures - Normalize model names in cache key using resolve_cached_repo_id_case() to avoid duplicate entries for different casings of the same HF repo (aligns with case normalization from #4822) - Return None instead of False on transient failures (network errors, subprocess timeouts, HF API issues) so the cache layer can distinguish "definitely not a vision model" from "failed to check" - Only cache definitive True/False results; transient failures are retried on the next call instead of being permanently locked in as False
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to cache is_vision_model() results to avoid redundant subprocess spawns and HuggingFace API calls. As a summary, this PR adds a module-level _vision_detection_cache dict, wraps is_vision_model() as a cache layer, and moves the detection logic into _is_vision_model_uncached().
I ran this through 10 independent reviewers (7 automated + 3 manual). Here is the consensus:
| Reviewers | Severity | Finding |
|---|---|---|
| 10/10 | High | Transient failures (subprocess timeout, HF API down, network errors) return False which was being unconditionally cached -- permanently misclassifying VLMs until process restart |
| 5/10 | High | Cache key used raw model_name without case normalization, conflicting with the case resolution pattern from #4822 (Org/Model vs org/model would create duplicate cache entries) |
| 7/10 | Medium | Cache key for hf_token=None callers depends on ambient HF auth state (pre-existing issue, not introduced by this PR) |
| 1/10 | Low | Plaintext HF token stored as dict key (consistent with existing _embedding_detection_cache, not a regression) |
I pushed fixes for the two real issues directly:
-
Transient failure caching: Changed
_is_vision_model_subprocessand_is_vision_model_uncachedto returnOptional[bool]--Noneon transient failures,True/Falsefor definitive results. The cache wrapper only caches non-None results, so failed lookups are retried on the next call. -
Cache key normalization: Added
resolve_cached_repo_id_case()for HF repo names andnormalize_path()for local paths in the cache key, consistent with the case normalization from #4822.
Separated the test file into #4855 to keep this PR focused on the implementation change.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 03fc678d3f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| normalized = normalize_path(model_name) | ||
| else: | ||
| normalized = resolve_cached_repo_id_case(model_name) | ||
| cache_key = (normalized, hf_token) |
There was a problem hiding this comment.
Include default auth state in vision cache key
cache_key = (normalized, hf_token) treats every token-less call as the same context, but _is_vision_model_uncached uses load_model_config(..., use_auth=True, token=None), which depends on mutable default HF auth (HF_TOKEN/cached login). If an unauthenticated token-less probe caches False, later token-less calls after auth is set will keep returning that stale value and can disable vision paths for gated models. Fresh evidence: trainer.py sets os.environ["HF_TOKEN"] before calling is_vision_model(model_name) without passing hf_token (pre_detect_and_load_tokenizer, lines 176-193).
Useful? React with 👍 / 👎.
…ation - Subprocess non-zero exit, JSON errors, and general exceptions return False (deterministic, cached) instead of None (retryable). Only subprocess.TimeoutExpired returns None since timeouts are transient. - Wrap cache key normalization in try/except so resolve_cached_repo_id_case or normalize_path failures fall back to raw model_name instead of crashing callers.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a caching mechanism for vision model detection to avoid redundant subprocess spawns and API calls. It refactors the detection logic into cached and uncached components, handling transient failures by returning None to allow retries, while caching definitive results. Feedback suggests improving the cache key normalization by handling shorthand HuggingFace model identifiers and refining exception handling in the uncached detection function to distinguish between transient and deterministic failures for better caching efficiency.
| try: | ||
| if is_local_path(model_name): | ||
| normalized = normalize_path(model_name) | ||
| else: | ||
| normalized = resolve_cached_repo_id_case(model_name) | ||
| except Exception: | ||
| normalized = model_name | ||
| cache_key = (normalized, hf_token) | ||
|
|
||
| if cache_key in _vision_detection_cache: | ||
| return _vision_detection_cache[cache_key] | ||
|
|
||
| result = _is_vision_model_uncached(model_name, hf_token) |
There was a problem hiding this comment.
The normalization logic for the cache key is missing the 'unsloth/' prefix addition for shorthand HuggingFace model identifiers (e.g., 'llama-3' vs 'unsloth/llama-3'). This can lead to duplicate cache entries and redundant detection calls for the same model. Additionally, the broad exception handler has been updated to include a logging statement to avoid silent failures, as per repository rules.
| try: | |
| if is_local_path(model_name): | |
| normalized = normalize_path(model_name) | |
| else: | |
| normalized = resolve_cached_repo_id_case(model_name) | |
| except Exception: | |
| normalized = model_name | |
| cache_key = (normalized, hf_token) | |
| if cache_key in _vision_detection_cache: | |
| return _vision_detection_cache[cache_key] | |
| result = _is_vision_model_uncached(model_name, hf_token) | |
| try: | |
| if is_local_path(model_name): | |
| normalized = normalize_path(model_name) | |
| else: | |
| # Add unsloth/ prefix for shorthand HF models to ensure cache consistency | |
| temp_name = model_name | |
| if "/" not in temp_name: | |
| temp_name = f"unsloth/{temp_name}" | |
| normalized = resolve_cached_repo_id_case(temp_name) | |
| except Exception: | |
| import logging | |
| logging.getLogger(__name__).debug("Failed to normalize model name", exc_info=True) | |
| normalized = model_name | |
| cache_key = (normalized, hf_token) | |
| if cache_key in _vision_detection_cache: | |
| return _vision_detection_cache[cache_key] | |
| result = _is_vision_model_uncached(normalized, hf_token) |
References
- Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.
| except Exception as e: | ||
| logger.warning(f"Could not determine if {model_name} is vision model: {e}") | ||
| return False | ||
| return None |
There was a problem hiding this comment.
In '_is_vision_model_uncached', all exceptions return 'None'. While safe for transient errors, it prevents caching deterministic failures (e.g., 'Model not found'). This causes every subsequent call for an invalid model name to trigger a new, failing HuggingFace API call. To improve efficiency, consider returning 'False' for clearly deterministic errors, matching the intent of the subprocess path.
References
- A simple dictionary cache without locks is acceptable for idempotent operations where the risk of benign race conditions and negligible memory growth is acceptable.
…ementation - Remove unused _make_config() helper function (dead code) - Fix test_exception_result_cached to actually exercise the exception path by mocking load_model_config to raise OSError instead of using side_effect=[False] which only tested normal False returns - Include model_config.py implementation (cache + _is_vision_model_uncached) so tests can run standalone without depending on PR #4853 - Cache key uses (model_name, hf_token) tuple for gated model correctness
… safety, token security - All subprocess failure paths now return None (transient) instead of False, preventing permanent misclassification of VLMs after temporary HF/auth/network errors - Use SHA256 fingerprint for hf_token in cache key instead of raw bearer token - Add threading.Lock with double-checked locking to prevent thundering herd of concurrent subprocess spawns for the same uncached model - Distinguish permanent failures (RepositoryNotFoundError, GatedRepoError, ValueError) from transient ones in _is_vision_model_uncached - Pass resolved/normalized model name to detection (not just cache key) - Log normalization fallback at debug level instead of silent swallow - Thread hf_token through callers in routes/models.py and trainer.py that previously omitted it
- Move detection computation outside the lock to avoid serializing long-running subprocess spawns (60s timeout) and HF API calls across all concurrent model checks. Lock is now only held for cache writes. - Use full SHA256 digest for token fingerprint instead of truncated 16-char prefix to eliminate collision risk.
- Add fallback import path for RepositoryNotFoundError/GatedRepoError from huggingface_hub.utils (older hub versions) when .errors is not available - Use sentinel-based dict.get() for single atomic cache read instead of two-step in/[] pattern (future-proof for no-GIL runtimes)
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 915fea7fca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| exc, | ||
| ) | ||
| resolved_name = model_name | ||
| cache_key = (resolved_name, _token_fingerprint(hf_token)) |
There was a problem hiding this comment.
Include ambient HF auth state in vision cache key
The new cache key only uses (resolved_model_name, token_fingerprint(hf_token)), so all token-less calls share the same entry even though detection with hf_token=None still depends on mutable ambient auth (HF_TOKEN/cached login). That means an early unauthenticated token-less probe can cache False and later token-less probes after auth changes will keep returning the stale value until process restart. Fresh evidence: there are still token-less call paths (routes/models.py check_vision_model and core/export/export.py load_checkpoint), so this regression remains observable despite the new token plumbing in other routes.
Useful? React with 👍 / 👎.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a thread-safe caching mechanism for vision model detection to reduce overhead from redundant subprocess spawns and Hugging Face API calls. It introduces token fingerprinting for secure cache keys and refactors the detection logic to differentiate between permanent failures, which are cached, and transient errors, which are not. Feedback suggests simplifying the complex exception handling logic used to categorize these errors to improve code maintainability.
| try: | ||
| from huggingface_hub.errors import RepositoryNotFoundError, GatedRepoError | ||
| except ImportError: | ||
| try: | ||
| from huggingface_hub.utils import ( | ||
| RepositoryNotFoundError, | ||
| GatedRepoError, | ||
| ) | ||
| except ImportError: | ||
| RepositoryNotFoundError = GatedRepoError = None | ||
| if RepositoryNotFoundError is not None and isinstance( | ||
| e, (RepositoryNotFoundError, GatedRepoError) | ||
| ): | ||
| return False | ||
| if isinstance(e, (ValueError, json.JSONDecodeError)): | ||
| return False | ||
| return None |
…unslothai#4853) * Add vision detection cache to is_vision_model() to avoid redundant subprocess spawns is_vision_model() is called 4-5 times per training run for the same model with zero caching. For transformers 5.x models, each call spawns a full subprocess (~6s each). This adds a module-level _vision_detection_cache dict following the same pattern as the existing _audio_detection_cache used by detect_audio_type(). The function is refactored into a thin cache wrapper around _is_vision_model_uncached(), saving ~12s per training run. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Include hf_token in vision cache key for gated model correctness Cache key is now (model_name, hf_token) instead of just model_name. This prevents stale False results when an unauthenticated probe for a gated model is followed by an authenticated call. * Remove test file from main PR - will be submitted separately * Fix vision cache: normalize model names and skip caching transient failures - Normalize model names in cache key using resolve_cached_repo_id_case() to avoid duplicate entries for different casings of the same HF repo (aligns with case normalization from unslothai#4822) - Return None instead of False on transient failures (network errors, subprocess timeouts, HF API issues) so the cache layer can distinguish "definitely not a vision model" from "failed to check" - Only cache definitive True/False results; transient failures are retried on the next call instead of being permanently locked in as False * Refine failure handling: cache deterministic failures, guard normalization - Subprocess non-zero exit, JSON errors, and general exceptions return False (deterministic, cached) instead of None (retryable). Only subprocess.TimeoutExpired returns None since timeouts are transient. - Wrap cache key normalization in try/except so resolve_cached_repo_id_case or normalize_path failures fall back to raw model_name instead of crashing callers. * Harden vision detection cache: fix transient failure handling, thread safety, token security - All subprocess failure paths now return None (transient) instead of False, preventing permanent misclassification of VLMs after temporary HF/auth/network errors - Use SHA256 fingerprint for hf_token in cache key instead of raw bearer token - Add threading.Lock with double-checked locking to prevent thundering herd of concurrent subprocess spawns for the same uncached model - Distinguish permanent failures (RepositoryNotFoundError, GatedRepoError, ValueError) from transient ones in _is_vision_model_uncached - Pass resolved/normalized model name to detection (not just cache key) - Log normalization fallback at debug level instead of silent swallow - Thread hf_token through callers in routes/models.py and trainer.py that previously omitted it * Refine lock strategy and token fingerprint - Move detection computation outside the lock to avoid serializing long-running subprocess spawns (60s timeout) and HF API calls across all concurrent model checks. Lock is now only held for cache writes. - Use full SHA256 digest for token fingerprint instead of truncated 16-char prefix to eliminate collision risk. * Fix huggingface_hub import fallback and use atomic cache read - Add fallback import path for RepositoryNotFoundError/GatedRepoError from huggingface_hub.utils (older hub versions) when .errors is not available - Use sentinel-based dict.get() for single atomic cache read instead of two-step in/[] pattern (future-proof for no-GIL runtimes) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Summary
_vision_detection_cachedict tois_vision_model()following the exact same pattern as the existing_audio_detection_cacheused bydetect_audio_type()is_vision_model()into a thin cache wrapper around_is_vision_model_uncached()— detection logic is completely unchangedTest plan
test_vision_cache.pycovering cache hits, misses, False caching, subprocess path, exception handling, direct detection path, audio exclusion, and token handlingunsloth/Qwen3.5-2B) — verify"checking vision via subprocess"appears once per process instead of 2+