Skip to content

common : add standard Hugging Face cache support#20775

Merged
angt merged 9 commits into
ggml-org:masterfrom
angt:common-add-standard-hugging-face-cache-support
Mar 24, 2026
Merged

common : add standard Hugging Face cache support#20775
angt merged 9 commits into
ggml-org:masterfrom
angt:common-add-standard-hugging-face-cache-support

Conversation

@angt

@angt angt commented Mar 19, 2026

Copy link
Copy Markdown
Member
  • Use HF API to find all files
  • Migrate all manifests to hugging face cache at startup

@angt angt requested a review from a team as a code owner March 19, 2026 21:10
@angt

angt commented Mar 19, 2026

Copy link
Copy Markdown
Member Author

WARNING: Do not test without taking care of your cache , or you'll regret it. There is no come-back.😬

@angt angt force-pushed the common-add-standard-hugging-face-cache-support branch 7 times, most recently from 3638b70 to 77ff285 Compare March 20, 2026 07:36
@ggerganov

Copy link
Copy Markdown
Member

Is it going to handle correctly repos that require HF token (e.g. gated, private)? I think it will back out from the migration of that specific manifest, correct?

@angt

angt commented Mar 20, 2026

Copy link
Copy Markdown
Member Author

Is it going to handle correctly repos that require HF token (e.g. gated, private)? I think it will back out from the migration of that specific manifest, correct?

I need to test this. I'm afraid that without the token there is no way to migrate correctly..

@ggerganov

Copy link
Copy Markdown
Member

Is it going to handle correctly repos that require HF token (e.g. gated, private)? I think it will back out from the migration of that specific manifest, correct?

I need to test this. I'm afraid that without the token there is no way to migrate correctly..

So is the current logic that the migration will only happen if an HF token is provided? I think that makes sense.

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@angt angt force-pushed the common-add-standard-hugging-face-cache-support branch from 77ff285 to 6fd16ba Compare March 20, 2026 14:11
@julien-c

Copy link
Copy Markdown
Contributor

tried it locally, worked well!

number of models in cache: 9
   1. bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
   2. ggml-org/Nemotron-Nano-3-30B-A3B-GGUF:Q4_K_M
   3. ggml-org/gemma-3-1b-it-GGUF:Q4_K_M
   4. ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
   5. ggml-org/gpt-oss-20b-GGUF:MXFP4
   6. lmstudio-ai/gemma-2b-it-GGUF:Q4_K_M
   7. lmstudio-ai/gemma-2b-it-GGUF:Q8_0
   8. unsloth/Qwen3.5-4B-GGUF:Q4_K_XL
   9. unsloth/Qwen3.5-9B-GGUF:Q4_K_XL

Comment thread common/hf-cache.h Outdated
Comment thread common/download.h Outdated
Comment thread common/download.cpp Outdated
angt added 2 commits March 20, 2026 17:25
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@angt angt requested a review from a team as a code owner March 20, 2026 18:06
@github-actions github-actions Bot added python python script changes server labels Mar 20, 2026

@ggerganov ggerganov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fail case that I can think of is if the user has the current llama.cpp cache on a larger, separate disk from the one where the HF cache is. This would cause to move files from the larger to the smaller and it might get full in the process. But I don't think we have a way to prevent that, if we want the migration to be automatic.

@ggerganov ggerganov requested a review from ngxson March 20, 2026 18:45
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Comment thread common/download.h Outdated
Comment thread common/download.cpp
@angt angt force-pushed the common-add-standard-hugging-face-cache-support branch 2 times, most recently from 62bcccb to 5d0c722 Compare March 22, 2026 09:02
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Comment thread common/hf-cache.cpp
Comment thread common/download.h
Comment thread common/hf-cache.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@angt

angt commented Mar 23, 2026

Copy link
Copy Markdown
Member Author

Let's go @ngxson @ggerganov ?

@angt angt merged commit 8c7957c into ggml-org:master Mar 24, 2026
51 checks passed
@ggerganov

Copy link
Copy Markdown
Member

I think we should add some prominent notification on the top of the README.md about this change - I expect there to be some level of confusion from the migration.

@angt

angt commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

Do you want a bigger WARNING with an explanation when the files are migrated the first time ?

@ggerganov

Copy link
Copy Markdown
Member

I guess a warning in the logs might be useful too.

@angt

angt commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

see #20935

@CISC

CISC commented Mar 24, 2026

Copy link
Copy Markdown
Member

@angt

angt commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

@CISC

CISC commented Mar 24, 2026

Copy link
Copy Markdown
Member

https://github.com/ggml-org/llama.cpp/actions/runs/23476321133/job/68309759940

This was not tested in the PR ?

Sanitizer jobs (and server-metal, which is also generally failing now) are manual outside of master since #20546

@angt

angt commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

Thanks, i missed that.

@angt

angt commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

see #20946

@wbste

wbste commented Mar 25, 2026

Copy link
Copy Markdown

Yikes! Anyway to disable this or hide models from HF_HOME? I'm seeing EVERYTHING in my cache, even stuff llama.cpp can't run (Safetensors, Gitattributes, etc...). I purposely have my gguf files in another location so I don't mix HF stuff with llama.cpp. Thoughts on how to handle?

@WhyNotHugo

Copy link
Copy Markdown
Contributor

Regresses: #21280

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
julien-c added a commit to huggingface/hub-docs that referenced this pull request May 4, 2026
…maBarn (#2453)

Move llama.cpp and HuggingFaceModelDownloader under a new Applications
table, add LlamaBarn, and replace the "Work in progress" note for
llama.cpp with a link to ggml-org/llama.cpp#20775 which added standard
Hugging Face cache support.

Co-authored-by: julien-agent <Agents+cyolo@huggingface.co>
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants