Checklist
Describe the bug
I started sglang on my newly downloaded model, and much to my surprise sglang deleted the entire model from disk. I believe this is an uncommon corner case from this PR: #13729 @alisonshao
I am using sglang==0.5.6.post1, transformers==5.0.0rc0, huggingface-hub==1.2.1. I first used .venv/bin/hf download QuantTrio/DeepSeek-V3.2-AWQ, then I launched sglang with
.venv/bin/python -m sglang.launch_server --model QuantTrio/DeepSeek-V3.2-AWQ --served-model-name QuantTrio/DeepSeek-V3.2-AWQ --host localhost --port 8000 --mem-fraction-static 0.95 --sleep-on-idle --tp=4 --context-length 32768 --attention-backend flashinfer --chunked-prefill-size 8192 --enable-mixed-chunk --cuda-graph-max-bs 1 --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 8}'
Which produced this log
...
[2025-12-09 14:26:48 TP0] Init torch distributed ends. mem usage=0.28 GB
[2025-12-09 14:26:48 TP2] Init torch distributed ends. mem usage=0.28 GB
[2025-12-09 14:26:48 TP1] Init torch distributed ends. mem usage=0.28 GB
[2025-12-09 14:26:48 TP3] Init torch distributed ends. mem usage=0.28 GB
[2025-12-09 14:26:49 TP1] Ignore import error when loading sglang.srt.models.mindspore: name 'ms' is not defined
[2025-12-09 14:26:49 TP3] Ignore import error when loading sglang.srt.models.mindspore: name 'ms' is not defined
[2025-12-09 14:26:49 TP0] Ignore import error when loading sglang.srt.models.mindspore: name 'ms' is not defined
[2025-12-09 14:26:49 TP2] Ignore import error when loading sglang.srt.models.mindspore: name 'ms' is not defined
[2025-12-09 14:26:49 TP1] Load weight begin. avail mem=94.10 GB
[2025-12-09 14:26:49 TP0] Load weight begin. avail mem=94.10 GB
[2025-12-09 14:26:49 TP3] Load weight begin. avail mem=94.10 GB
[2025-12-09 14:26:49 TP2] Load weight begin. avail mem=94.10 GB
[2025-12-09 14:26:50 TP0] Shared experts fusion optimization enabled.
[2025-12-09 14:26:50 TP0] Corrupted safetensors file detected: /home/kmod/.cache/huggingface/hub/models--QuantTrio--DeepSeek-V3.2-AWQ/snapshots/340023cb6036c97c5c664ac944300e9d2b1a3f2e/model-00008-of-00121.safetensors - SafetensorError: Error while deserializing header: invalid JSON in header: EOF while parsing a value at line 1 column 0
[2025-12-09 14:26:50 TP0] Found 1 corrupted file(s) for QuantTrio/DeepSeek-V3.2-AWQ: Corrupted shard files: ['model-00008-of-00121.safetensors']. Will selectively clean and re-download only these files.
[2025-12-09 14:26:50 TP0] Removed corrupted symlink: model-00008-of-00121.safetensors
[2025-12-09 14:26:50 TP0] Removed corrupted blob: 071a3348289365c723f283ce58c412d17b5b46184e03c653cf2032465d7aa31b
[2025-12-09 14:26:50 TP0] Removed 1 corrupted file(s) for QuantTrio/DeepSeek-V3.2-AWQ. These will be re-downloaded on next load.
[2025-12-09 14:26:50 TP0] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ "HTTP/1.1 200 OK"
[2025-12-09 14:26:50 TP2] Removing entire cache for QuantTrio/DeepSeek-V3.2-AWQ at /home/kmod/.cache/huggingface/hub/models--QuantTrio--DeepSeek-V3.2-AWQ. Reason: Missing 1 file(s) from index model.safetensors.index.json: ['model-00008-of-00121.safetensors']
[2025-12-09 14:26:50 TP0] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/tree/main?recursive=false&expand=false "HTTP/1.1 200 OK"
[2025-12-09 14:26:50 TP0] Using model weights format ['*.safetensors']
[2025-12-09 14:26:50 TP0] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/revision/main "HTTP/1.1 200 OK"
[2025-12-09 14:26:50 TP0] HTTP Request: HEAD https://huggingface.co/QuantTrio/DeepSeek-V3.2-AWQ/resolve/340023cb6036c97c5c664ac944300e9d2b1a3f2e/model-00008-of-00121.safetensors "HTTP/1.1 302 Found"
[2025-12-09 14:26:50 TP0] HTTP Request: HEAD https://huggingface.co/QuantTrio/DeepSeek-V3.2-AWQ/resolve/340023cb6036c97c5c664ac944300e9d2b1a3f2e/model-00041-of-00121.safetensors "HTTP/1.1 302 Found"
[2025-12-09 14:26:51 TP0] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/xet-read-token/340023cb6036c97c5c664ac944300e9d2b1a3f2e "HTTP/1.1 200 OK"
[2025-12-09 14:26:51 TP0] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/xet-read-token/340023cb6036c97c5c664ac944300e9d2b1a3f2e "HTTP/1.1 200 OK"
[2025-12-09 14:27:07 TP2] Failed to remove corrupted cache directory /home/kmod/.cache/huggingface/hub/models--QuantTrio--DeepSeek-V3.2-AWQ: [Errno 39] Directory not empty: 'blobs'. Manual cleanup may be required.
[2025-12-09 14:27:07 TP3] Removing entire cache for QuantTrio/DeepSeek-V3.2-AWQ at /home/kmod/.cache/huggingface/hub/models--QuantTrio--DeepSeek-V3.2-AWQ. Reason: Incomplete download detected (2 incomplete files)
[2025-12-09 14:27:07 TP3] Successfully removed corrupted cache directory
[2025-12-09 14:27:08 TP1] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ "HTTP/1.1 200 OK"
[2025-12-09 14:27:08 TP3] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ "HTTP/1.1 200 OK"
[2025-12-09 14:27:08 TP2] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ "HTTP/1.1 200 OK"
[2025-12-09 14:27:08 TP2] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/tree/main?recursive=false&expand=false "HTTP/1.1 200 OK"
[2025-12-09 14:27:08 TP3] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/tree/main?recursive=false&expand=false "HTTP/1.1 200 OK"
[2025-12-09 14:27:09 TP1] HTTP Request: GET https://huggingface.co/api/models/QuantTrio/DeepSeek-V3.2-AWQ/tree/main?recursive=false&expand=false "HTTP/1.1 200 OK"
[2025-12-09 14:27:20 TP0] Scheduler hit an exception: Traceback (most recent call last):
[snip]
File "/home/kmod/ai/.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py", line 429, in _inner_hf_hub_download
hf_hub_download( # type: ignore
File "/home/kmod/ai/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 89, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/kmod/ai/.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1024, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kmod/ai/.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1240, in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
File "/home/kmod/ai/.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1864, in _download_to_tmp_and_move
xet_get(
File "/home/kmod/ai/.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 588, in xet_get
download_files(
RuntimeError: Data processing error: CAS service error : IO Error: No such file or directory (os error 2)
One thing to note is that this safetensors file is 3GiB, and my internet connection is 50MiB/s, so it was not possible that the file was actually fully redownloaded based on the fact that only 16s had elapsed. I suspect the 16s is more related to the time that it took to delete the ~360GiB of weights.
It looks like one of the blobs was corrupted, and sglang's intention was to delete and redownload the single blob. But this interacts poorly with the "delete the entire model if any files are missing" logic, to put it mildly. Even though the "delete the model if any files are missing" check comes before the "delete any corrupted files" check, multiple workers will execute these checks. I can see that the validation logic is successfully synchronized, and the model download logic is synchronized, but it seems the issue is that they use separate lock invocations, and even separate locks.
It seems what happened was
- TP0 acquired the validation lock, saw that the file is corrupted, deleted it, released the validation lock, and started redownloading the file
- TP2 immediately acquired the validation lock, saw that a file is missing from the index, and deleted the whole directory
My guess as to the easiest and safest thing to do would be to expand the lock inside download_weights_from_hf() to cover the entire function.
Also for what it's worth, automatically calling shutil.rmtree (aka rm -rf) on a user's machine seems excessively dangerous, especially when the data was provided by the user. In this case it was a public model which could be redownloaded, but this feels worryingly close to deleting a custom local model. Also the
rm -rf $DIR/../..
pattern in _cleanup_corrupted_model_cache() is only one directory layout change away from deleting a much larger portion of the user's filesystem like in the horror stories.
I think it would be preferable to simply alert the user to the unrecoverability of the cache and recommend that they delete it themselves. This code path should be executed only extremely rarely, which means both that this shouldn't cause too much burden on users and that it won't get the battle testing that I think is necessary for an automated destructive action such as this.
Reproduction
.venv/bin/hf download QuantTrio/DeepSeek-V3.2-AWQ
# simulated data corruption:
dd if=/dev/urandom conv=notrunc of=~/.cache/huggingface/hub/models--QuantTrio--DeepSeek-V3.2-AWQ/snapshots/340023cb6036c97c5c664ac944300e9d2b1a3f2e/model-00008-of-00121.safetensors bs=1M count=16
.venv/bin/python -m sglang.launch_server --model QuantTrio/DeepSeek-V3.2-AWQ --served-model-name QuantTrio/DeepSeek-V3.2-AWQ --host localhost --port 8000 --mem-fraction-static 0.95 --sleep-on-idle --tp=4 --context-length 32768 --attention-backend flashinfer --chunked-prefill-size 8192 --enable-mixed-chunk --cuda-graph-max-bs 1 --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 8}'
Environment
Python: 3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
GPU 0,1,2,3 Compute Capability: 12.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.8, V12.8.93
CUDA Driver Version: 580.95.05
PyTorch: 2.9.1+cu128
sglang: 0.5.6.post1
sgl_kernel: 0.3.19
flashinfer_python: 0.5.3
flashinfer_cubin: 0.5.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 5.0.0rc0
torchao: 0.9.0
numpy: 2.2.6
aiohttp: 3.13.2
fastapi: 0.123.9
hf_transfer: 0.1.9
huggingface_hub: 1.2.1
interegular: 0.3.3
modelscope: 1.32.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.5
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: 0.12.0
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.71.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE NODE NODE 0-127 0 N/A
GPU1 NODE X NODE NODE 0-127 0 N/A
GPU2 NODE NODE X NODE 0-127 0 N/A
GPU3 NODE NODE NODE X 0-127 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
ulimit soft: 1024
Checklist
Describe the bug
I started sglang on my newly downloaded model, and much to my surprise sglang deleted the entire model from disk. I believe this is an uncommon corner case from this PR: #13729 @alisonshao
I am using sglang==0.5.6.post1, transformers==5.0.0rc0, huggingface-hub==1.2.1. I first used
.venv/bin/hf download QuantTrio/DeepSeek-V3.2-AWQ, then I launched sglang withWhich produced this log
One thing to note is that this safetensors file is 3GiB, and my internet connection is 50MiB/s, so it was not possible that the file was actually fully redownloaded based on the fact that only 16s had elapsed. I suspect the 16s is more related to the time that it took to delete the ~360GiB of weights.
It looks like one of the blobs was corrupted, and sglang's intention was to delete and redownload the single blob. But this interacts poorly with the "delete the entire model if any files are missing" logic, to put it mildly. Even though the "delete the model if any files are missing" check comes before the "delete any corrupted files" check, multiple workers will execute these checks. I can see that the validation logic is successfully synchronized, and the model download logic is synchronized, but it seems the issue is that they use separate lock invocations, and even separate locks.
It seems what happened was
My guess as to the easiest and safest thing to do would be to expand the lock inside download_weights_from_hf() to cover the entire function.
Also for what it's worth, automatically calling shutil.rmtree (aka
rm -rf) on a user's machine seems excessively dangerous, especially when the data was provided by the user. In this case it was a public model which could be redownloaded, but this feels worryingly close to deleting a custom local model. Also therm -rf $DIR/../..pattern in _cleanup_corrupted_model_cache() is only one directory layout change away from deleting a much larger portion of the user's filesystem like in the horror stories.
I think it would be preferable to simply alert the user to the unrecoverability of the cache and recommend that they delete it themselves. This code path should be executed only extremely rarely, which means both that this shouldn't cause too much burden on users and that it won't get the battle testing that I think is necessary for an automated destructive action such as this.
Reproduction
Environment