studio: reuse HF cached repo casing to prevent duplicate downloads#4822
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request introduces a case-resolution utility for Hugging Face repository IDs to avoid duplicate model downloads on case-sensitive systems. By checking the local cache for existing variants before defaulting to lowercasing, the system preserves user intent and optimizes storage. Changes span the inference worker, model routes, and loading utilities, supported by new unit tests. Feedback was provided regarding a broad exception handler in the path utility that lacks logging, which could hinder troubleshooting.
Tests for resolve_cached_repo_id_case and get_model_config case resolution belong in their own PR to keep this change focused on the runtime fix.
- Check exact-case path before memo to ensure a newly-appeared exact match always wins over a previously memoized variant - Validate memoized entries still exist on disk before returning them to prevent stale results when cache dirs are deleted/recreated
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to prevent duplicate HF model downloads caused by mixed-casing repo IDs between Unsloth Studio and other HuggingFace tools. As a summary, this PR replaces the previous unconditional lowercasing of remote model identifiers with a cache-aware resolver that reuses the exact casing already present in the local HF cache, and also preserves the original casing when stripping -bnb-4bit suffixes in the core loader.
I ran 7 parallel independent reviews + 3 additional independent reviewers (10 total), covering the runtime logic, memoization correctness, cross-platform behavior (macOS/Windows case-insensitive FS, WSL), suffix stripping edge cases, and backwards compatibility with transformers 4.57.6 / 5.4.0 and TRL 0.22.2 / 0.27.1.
| Reviewers | Severity | Finding |
|---|---|---|
| 7/10 | Fixed | Stale memoization: variant hits were memoized without revalidation, so deleted/recreated cache dirs returned stale results in the same process |
| 2/10 | Not a bug | Path.exists() exact-hit check on case-insensitive FS (macOS/Windows) -- this actually produces correct behavior on those platforms |
| 2/10 | P2/wontfix | Extra cache-dir scan in get_cache_path -- acceptable overhead for infrequent calls |
| 1/10 | False positive | -- to / replacement corrupts repo IDs with -- -- HF repo IDs never contain -- |
| 1/10 | False positive (0.01% edge) | Lexicographic tie-break could pick empty cache variant -- requires two case-variant dirs, one empty, which is extremely unlikely in practice |
The stale memoization issue (7/10 consensus) was the only real bug. I have already pushed a fix that:
- Checks the exact-case path before the memo so newly-appeared exact matches always win
- Validates memoized entries still exist on disk before returning them
- Drops stale memo entries and re-scans on the next call
I also moved the test files to a separate PR (#4823) to keep this one focused on the runtime changes.
Concrete suggestions for each finding below.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0073692dee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| resolved_name = resolve_cached_repo_id_case(model_name) | ||
| model_cache_name = resolved_name.replace("/", "--") |
There was a problem hiding this comment.
Skip case resolution for local filesystem paths
get_cache_path now unconditionally calls resolve_cached_repo_id_case, but that resolver treats any string containing / as a Hub repo ID; local paths like foo/bar (which are valid local selections when the directory exists) now go through case-insensitive HF cache matching. In the detect_audio_type -> _detect_audio_from_tokenizer -> get_cache_path path, this can resolve a local model name to a models--... cache directory and read tokenizer metadata from the wrong source, producing incorrect capability detection for local models while also adding an unnecessary full cache scan.
Useful? React with 👍 / 👎.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a case-resolution mechanism for Hugging Face repository IDs to prevent redundant model downloads caused by local file system case-sensitivity. It introduces the resolve_cached_repo_id_case utility with memoization and updates various model loading paths to preserve original casing while interacting with the local cache. Review feedback suggests refining the repository ID reconstruction logic to handle names containing double dashes, extending resolution to single-word identifiers, using more specific directory checks, and removing redundant fallback logic in path construction.
| repo_part = entry.name[len("models--") :] | ||
| if not repo_part: | ||
| continue | ||
| candidates.append(repo_part.replace("--", "/")) |
There was a problem hiding this comment.
Using .replace("--", "/") to reconstruct the repository ID is incorrect because it replaces all occurrences of --, including those that were part of the original namespace or repository name (e.g., my--org/model would incorrectly become my/org/model). Instead, the repository ID should be reconstructed by splitting the folder name based on the structure of the requested model_name.
| candidates.append(repo_part.replace("--", "/")) | |
| parts = model_name.split("/") | |
| resolved_parts = [] | |
| current_pos = 0 | |
| for p in parts: | |
| resolved_parts.append(repo_part[current_pos : current_pos + len(p)]) | |
| current_pos += len(p) + 2 | |
| candidates.append("/".join(resolved_parts)) |
| """ | ||
| _CACHE_CASE_RESOLUTION_STATS["calls"] += 1 | ||
|
|
||
| if not model_name or "/" not in model_name: |
There was a problem hiding this comment.
The check for a slash in the model_name is unnecessarily restrictive. Hugging Face repository IDs can be single-word identifiers (e.g., gpt2, bert-base-uncased) which are stored as models--gpt2 in the cache. Removing this restriction allows the resolver to handle casing issues for these models as well.
| if not model_name or "/" not in model_name: | |
| if not model_name: |
| # Always check the exact-case path first so a newly-appeared exact match | ||
| # wins over any previously memoized variant. | ||
| exact_path = cache_dir / expected_dir | ||
| if exact_path.exists(): |
There was a problem hiding this comment.
| repo_dir = get_cache_path(model_name) | ||
| if repo_dir is None: | ||
| repo_dir_name = f"models--{model_name.replace('/', '--')}" | ||
| repo_dir = cache_dir / repo_dir_name |
There was a problem hiding this comment.
This fallback logic is redundant. get_cache_path(model_name) already performs case resolution and checks for the existence of the repository directory (including the exact-case fallback). If it returns None, the directory does not exist in the cache, and manually constructing the path here will not change that. You can simplify this by using a single assignment with a fallback to ensure repo_dir is a valid Path object for the subsequent .exists() check.
| repo_dir = get_cache_path(model_name) | |
| if repo_dir is None: | |
| repo_dir_name = f"models--{model_name.replace('/', '--')}" | |
| repo_dir = cache_dir / repo_dir_name | |
| repo_dir = get_cache_path(model_name) or (cache_dir / f"models--{model_name.replace('/', '--')}") |
|
/gemini review |
- Use .is_dir() instead of .exists() for exact-case cache check (cache entries are always directories) - Remove redundant fallback in _detect_audio_from_tokenizer since get_cache_path already handles case resolution and returns None when the model is not cached
There was a problem hiding this comment.
Code Review
This pull request implements a case-resolution mechanism for Hugging Face repository IDs to prevent duplicate downloads by matching requested names against existing local cache entries. It replaces forced lowercasing with a memoized resolution utility and updates model loading paths across the backend. Feedback highlights an opportunity to simplify redundant path checks in model_config.py and suggests extending case resolution to support top-level model identifiers by removing the mandatory slash check.
| if repo_dir is None: | ||
| repo_dir_name = f"models--{model_name.replace('/', '--')}" | ||
| repo_dir = cache_dir / repo_dir_name | ||
| if repo_dir.exists(): |
There was a problem hiding this comment.
This manual path construction and existence check is redundant. get_cache_path(model_name) already performs these checks, including the case-resolution logic and the exact-match fallback. If get_cache_path returns None, it means the model directory (including the original casing) does not exist in the cache.
| if repo_dir is None: | |
| repo_dir_name = f"models--{model_name.replace('/', '--')}" | |
| repo_dir = cache_dir / repo_dir_name | |
| if repo_dir.exists(): | |
| if repo_dir: | |
| if not model_name or "/" not in model_name: | ||
| _CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1 | ||
| return model_name |
There was a problem hiding this comment.
The requirement for a slash (/) in the model name prevents case resolution for top-level Hugging Face models (e.g., gpt2, bert-base-uncased). While most models are namespaced, supporting top-level models would ensure consistent behavior across all Hugging Face identifiers. Removing this check is safe as is_local_path is typically called before this resolver.
| if not model_name or "/" not in model_name: | |
| _CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1 | |
| return model_name | |
| if not model_name: | |
| _CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1 | |
| return model_name |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a case-resolution mechanism for Hugging Face repository IDs to avoid redundant downloads caused by case-sensitivity differences between local storage and the HF Hub. It introduces the resolve_cached_repo_id_case utility, which scans the local cache for existing variants, and updates model loading paths in the backend and Unsloth modules to utilize this resolution. Review feedback suggests optimizing the resolution function by using os.scandir for performance, extending support to official models that lack slashes in their names, and improving memoization efficiency by using normalized keys.
| def resolve_cached_repo_id_case(model_name: str, use_memo: bool = True) -> str: | ||
| """Resolve repo_id to the exact casing already present in local HF cache. | ||
|
|
||
| Policy: prefer the requested/canonical repo_id, but if a case-variant already | ||
| exists in local HF cache, reuse that exact cached spelling. This avoids | ||
| duplicate downloads while preserving user intent whenever possible. | ||
| """ | ||
| _CACHE_CASE_RESOLUTION_STATS["calls"] += 1 | ||
|
|
||
| if not model_name or "/" not in model_name: | ||
| _CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1 | ||
| return model_name | ||
|
|
||
| cache_dir = _hf_hub_cache_dir() | ||
| if not cache_dir.exists(): | ||
| _CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1 | ||
| return model_name | ||
|
|
||
| expected_dir = f"models--{model_name.replace('/', '--')}" | ||
|
|
||
| # Always check the exact-case path first so a newly-appeared exact match | ||
| # wins over any previously memoized variant. | ||
| exact_path = cache_dir / expected_dir | ||
| if exact_path.is_dir(): | ||
| if use_memo: | ||
| _CACHE_CASE_RESOLUTION_MEMO[model_name] = model_name | ||
| _CACHE_CASE_RESOLUTION_STATS["exact_hits"] += 1 | ||
| return model_name | ||
|
|
||
| # Validate memoized entries still exist on disk before returning them. | ||
| # This prevents stale results when cache dirs are deleted/recreated. | ||
| if use_memo: | ||
| cached = _CACHE_CASE_RESOLUTION_MEMO.get(model_name) | ||
| if cached is not None: | ||
| cached_path = cache_dir / f"models--{cached.replace('/', '--')}" | ||
| if cached_path.is_dir(): | ||
| _CACHE_CASE_RESOLUTION_STATS["memo_hits"] += 1 | ||
| return cached | ||
| # Stale entry -- drop it and re-scan below. | ||
| _CACHE_CASE_RESOLUTION_MEMO.pop(model_name, None) | ||
|
|
||
| expected_lower = expected_dir.lower() | ||
| try: | ||
| candidates: list[str] = [] | ||
| for entry in cache_dir.iterdir(): | ||
| if not entry.is_dir(): | ||
| continue | ||
| if entry.name.lower() != expected_lower: | ||
| continue | ||
| if not entry.name.startswith("models--"): | ||
| continue | ||
| repo_part = entry.name[len("models--") :] | ||
| if not repo_part: | ||
| continue | ||
| candidates.append(repo_part.replace("--", "/")) | ||
|
|
||
| if candidates: | ||
| # Deterministic tie-break if multiple case variants coexist. | ||
| resolved = sorted(candidates)[0] | ||
| if len(candidates) > 1: | ||
| _CACHE_CASE_RESOLUTION_STATS["tie_breaks"] += 1 | ||
| _CACHE_CASE_RESOLUTION_STATS["variant_hits"] += 1 | ||
| if use_memo: | ||
| _CACHE_CASE_RESOLUTION_MEMO[model_name] = resolved | ||
| return resolved | ||
| except Exception as exc: | ||
| _CACHE_CASE_RESOLUTION_STATS["errors"] += 1 | ||
| logger.debug(f"Could not resolve cached repo_id case for '{model_name}': {exc}") | ||
|
|
||
| _CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1 | ||
| return model_name |
There was a problem hiding this comment.
The resolve_cached_repo_id_case function can be improved in several ways:
- Support for official models: The check
"/" not in model_nameprevents casing resolution for official models (e.g.,gpt2,bert-base-uncased), which are also stored in the HF cache with themodels--prefix. Removing this check ensures all remote models benefit from casing resolution. - Memoization Efficiency: Normalizing the memoization key to lowercase ensures that subsequent requests with different casing for the same model ID (e.g.,
Org/Repovsorg/repo) hit the cache correctly. - Performance: Using
os.scandirinstead ofPath.iterdiris more efficient as it avoids creatingPathobjects for every entry in the cache directory, which can be significant if the cache contains many models.
def resolve_cached_repo_id_case(model_name: str, use_memo: bool = True) -> str:
"""Resolve repo_id to the exact casing already present in local HF cache.
Policy: prefer the requested/canonical repo_id, but if a case-variant already
exists in local HF cache, reuse that exact cached spelling. This avoids
duplicate downloads while preserving user intent whenever possible.
"""
_CACHE_CASE_RESOLUTION_STATS["calls"] += 1
if not model_name:
_CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1
return model_name
cache_dir = _hf_hub_cache_dir()
if not cache_dir.exists():
_CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1
return model_name
expected_dir = f"models--{model_name.replace('/', '--')}"
expected_lower = expected_dir.lower()
memo_key = model_name.lower()
# Always check the exact-case path first so a newly-appeared exact match
# wins over any previously memoized variant.
exact_path = cache_dir / expected_dir
if exact_path.is_dir():
if use_memo:
_CACHE_CASE_RESOLUTION_MEMO[memo_key] = model_name
_CACHE_CASE_RESOLUTION_STATS["exact_hits"] += 1
return model_name
# Validate memoized entries still exist on disk before returning them.
if use_memo:
cached = _CACHE_CASE_RESOLUTION_MEMO.get(memo_key)
if cached is not None:
cached_path = cache_dir / f"models--{cached.replace('/', '--')}"
if cached_path.is_dir():
_CACHE_CASE_RESOLUTION_STATS["memo_hits"] += 1
return cached
# Stale entry -- drop it and re-scan below.
_CACHE_CASE_RESOLUTION_MEMO.pop(memo_key, None)
try:
candidates: list[str] = []
# Use os.scandir for better performance on large cache directories
with os.scandir(cache_dir) as it:
for entry in it:
if not entry.is_dir():
continue
if entry.name.lower() == expected_lower:
# Extract the repo part (everything after 'models--')
repo_part = entry.name[8:]
candidates.append(repo_part.replace("--", "/"))
if candidates:
# Deterministic tie-break if multiple case variants coexist.
resolved = sorted(candidates)[0]
if len(candidates) > 1:
_CACHE_CASE_RESOLUTION_STATS["tie_breaks"] += 1
_CACHE_CASE_RESOLUTION_STATS["variant_hits"] += 1
if use_memo:
_CACHE_CASE_RESOLUTION_MEMO[memo_key] = resolved
return resolved
except Exception as exc:
_CACHE_CASE_RESOLUTION_STATS["errors"] += 1
logger.debug(f"Could not resolve cached repo_id case for '{model_name}': {exc}")
_CACHE_CASE_RESOLUTION_STATS["fallbacks"] += 1
return model_nameReferences
- Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.
- A simple dictionary cache without locks is acceptable for idempotent operations where the risk of benign race conditions and negligible memory growth is acceptable.
…#4853) * Add vision detection cache to is_vision_model() to avoid redundant subprocess spawns is_vision_model() is called 4-5 times per training run for the same model with zero caching. For transformers 5.x models, each call spawns a full subprocess (~6s each). This adds a module-level _vision_detection_cache dict following the same pattern as the existing _audio_detection_cache used by detect_audio_type(). The function is refactored into a thin cache wrapper around _is_vision_model_uncached(), saving ~12s per training run. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Include hf_token in vision cache key for gated model correctness Cache key is now (model_name, hf_token) instead of just model_name. This prevents stale False results when an unauthenticated probe for a gated model is followed by an authenticated call. * Remove test file from main PR - will be submitted separately * Fix vision cache: normalize model names and skip caching transient failures - Normalize model names in cache key using resolve_cached_repo_id_case() to avoid duplicate entries for different casings of the same HF repo (aligns with case normalization from #4822) - Return None instead of False on transient failures (network errors, subprocess timeouts, HF API issues) so the cache layer can distinguish "definitely not a vision model" from "failed to check" - Only cache definitive True/False results; transient failures are retried on the next call instead of being permanently locked in as False * Refine failure handling: cache deterministic failures, guard normalization - Subprocess non-zero exit, JSON errors, and general exceptions return False (deterministic, cached) instead of None (retryable). Only subprocess.TimeoutExpired returns None since timeouts are transient. - Wrap cache key normalization in try/except so resolve_cached_repo_id_case or normalize_path failures fall back to raw model_name instead of crashing callers. * Harden vision detection cache: fix transient failure handling, thread safety, token security - All subprocess failure paths now return None (transient) instead of False, preventing permanent misclassification of VLMs after temporary HF/auth/network errors - Use SHA256 fingerprint for hf_token in cache key instead of raw bearer token - Add threading.Lock with double-checked locking to prevent thundering herd of concurrent subprocess spawns for the same uncached model - Distinguish permanent failures (RepositoryNotFoundError, GatedRepoError, ValueError) from transient ones in _is_vision_model_uncached - Pass resolved/normalized model name to detection (not just cache key) - Log normalization fallback at debug level instead of silent swallow - Thread hf_token through callers in routes/models.py and trainer.py that previously omitted it * Refine lock strategy and token fingerprint - Move detection computation outside the lock to avoid serializing long-running subprocess spawns (60s timeout) and HF API calls across all concurrent model checks. Lock is now only held for cache writes. - Use full SHA256 digest for token fingerprint instead of truncated 16-char prefix to eliminate collision risk. * Fix huggingface_hub import fallback and use atomic cache read - Add fallback import path for RepositoryNotFoundError/GatedRepoError from huggingface_hub.utils (older hub versions) when .errors is not available - Use sentinel-based dict.get() for single atomic cache read instead of two-step in/[] pattern (future-proof for no-GIL runtimes) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…nslothai#4822) * fix(studio): reuse HF cached repo casing to prevent duplicate downloads * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move cache case resolution tests to separate PR Tests for resolve_cached_repo_id_case and get_model_config case resolution belong in their own PR to keep this change focused on the runtime fix. * fix(studio): debug-log HF_HUB_CACHE fallback in path_utils * Fix stale memoization in resolve_cached_repo_id_case - Check exact-case path before memo to ensure a newly-appeared exact match always wins over a previously memoized variant - Validate memoized entries still exist on disk before returning them to prevent stale results when cache dirs are deleted/recreated * Minor cleanups for cache case resolution - Use .is_dir() instead of .exists() for exact-case cache check (cache entries are always directories) - Remove redundant fallback in _detect_audio_from_tokenizer since get_cache_path already handles case resolution and returns None when the model is not cached --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…ai#4823) Tests for resolve_cached_repo_id_case and get_model_config case resolution, separated from the runtime changes in PR unslothai#4822.
…unslothai#4853) * Add vision detection cache to is_vision_model() to avoid redundant subprocess spawns is_vision_model() is called 4-5 times per training run for the same model with zero caching. For transformers 5.x models, each call spawns a full subprocess (~6s each). This adds a module-level _vision_detection_cache dict following the same pattern as the existing _audio_detection_cache used by detect_audio_type(). The function is refactored into a thin cache wrapper around _is_vision_model_uncached(), saving ~12s per training run. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Include hf_token in vision cache key for gated model correctness Cache key is now (model_name, hf_token) instead of just model_name. This prevents stale False results when an unauthenticated probe for a gated model is followed by an authenticated call. * Remove test file from main PR - will be submitted separately * Fix vision cache: normalize model names and skip caching transient failures - Normalize model names in cache key using resolve_cached_repo_id_case() to avoid duplicate entries for different casings of the same HF repo (aligns with case normalization from unslothai#4822) - Return None instead of False on transient failures (network errors, subprocess timeouts, HF API issues) so the cache layer can distinguish "definitely not a vision model" from "failed to check" - Only cache definitive True/False results; transient failures are retried on the next call instead of being permanently locked in as False * Refine failure handling: cache deterministic failures, guard normalization - Subprocess non-zero exit, JSON errors, and general exceptions return False (deterministic, cached) instead of None (retryable). Only subprocess.TimeoutExpired returns None since timeouts are transient. - Wrap cache key normalization in try/except so resolve_cached_repo_id_case or normalize_path failures fall back to raw model_name instead of crashing callers. * Harden vision detection cache: fix transient failure handling, thread safety, token security - All subprocess failure paths now return None (transient) instead of False, preventing permanent misclassification of VLMs after temporary HF/auth/network errors - Use SHA256 fingerprint for hf_token in cache key instead of raw bearer token - Add threading.Lock with double-checked locking to prevent thundering herd of concurrent subprocess spawns for the same uncached model - Distinguish permanent failures (RepositoryNotFoundError, GatedRepoError, ValueError) from transient ones in _is_vision_model_uncached - Pass resolved/normalized model name to detection (not just cache key) - Log normalization fallback at debug level instead of silent swallow - Thread hf_token through callers in routes/models.py and trainer.py that previously omitted it * Refine lock strategy and token fingerprint - Move detection computation outside the lock to avoid serializing long-running subprocess spawns (60s timeout) and HF API calls across all concurrent model checks. Lock is now only held for cache writes. - Use full SHA256 digest for token fingerprint instead of truncated 16-char prefix to eliminate collision risk. * Fix huggingface_hub import fallback and use atomic cache read - Add fallback import path for RepositoryNotFoundError/GatedRepoError from huggingface_hub.utils (older hub versions) when .errors is not available - Use sentinel-based dict.get() for single atomic cache read instead of two-step in/[] pattern (future-proof for no-GIL runtimes) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Related to #2592
Some Studio and loader paths lowercased remote model repo IDs.
HuggingFace accepts that for API calls, but the on-disk cache folder name follows the exact string you used when downloading.
Mixed casing from Unsloth vs Transformers/other tools then showed up as two
models--…dirs for the same repo and triggered duplicate downloads.Studio backend:
Stop forcing lowercase in
ModelConfig/ UI selection; resolve to an existing cached repo-id casing when a case-only variant is already on disk; normalize early inget_model_configso vision/embedding/audio checks use the resolved id.Paths:
Add
resolve_cached_repo_id_case+ use it fromget_cache_path; optional memoization for hits only (no memo on cache miss so a later download is still picked up).Inference worker:
Use the same resolver when watching
models--…/blobsfor download progress.Core loader:
Preserve casing when stripping
-bnb-4bitsuffixes on the AMD no-prequant path; return the user’s explicit-bnb-4bitid casing fromget_model_namemapping.Tests moved to #4823*
Note: Dataset repos can have the same class of issue; this PR only handles model repo IDs. Follow-up can mirror the same pattern for
repo_type=datasetif we want parity.