common : add standard Hugging Face cache support by angt · Pull Request #20775 · ggml-org/llama.cpp

angt · 2026-03-19T21:10:57Z

Use HF API to find all files
Migrate all manifests to hugging face cache at startup

angt · 2026-03-19T21:13:32Z

WARNING: Do not test without taking care of your cache , or you'll regret it. There is no come-back.😬

ggerganov · 2026-03-20T08:42:07Z

Is it going to handle correctly repos that require HF token (e.g. gated, private)? I think it will back out from the migration of that specific manifest, correct?

angt · 2026-03-20T08:53:23Z

Is it going to handle correctly repos that require HF token (e.g. gated, private)? I think it will back out from the migration of that specific manifest, correct?

I need to test this. I'm afraid that without the token there is no way to migrate correctly..

ggerganov · 2026-03-20T08:56:49Z

Is it going to handle correctly repos that require HF token (e.g. gated, private)? I think it will back out from the migration of that specific manifest, correct?

I need to test this. I'm afraid that without the token there is no way to migrate correctly..

So is the current logic that the migration will only happen if an HF token is provided? I think that makes sense.

- Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co>

julien-c · 2026-03-20T15:26:35Z

tried it locally, worked well!

number of models in cache: 9
   1. bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
   2. ggml-org/Nemotron-Nano-3-30B-A3B-GGUF:Q4_K_M
   3. ggml-org/gemma-3-1b-it-GGUF:Q4_K_M
   4. ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
   5. ggml-org/gpt-oss-20b-GGUF:MXFP4
   6. lmstudio-ai/gemma-2b-it-GGUF:Q4_K_M
   7. lmstudio-ai/gemma-2b-it-GGUF:Q8_0
   8. unsloth/Qwen3.5-4B-GGUF:Q4_K_XL
   9. unsloth/Qwen3.5-9B-GGUF:Q4_K_XL

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ggerganov

One fail case that I can think of is if the user has the current llama.cpp cache on a larger, separate disk from the one where the HF cache is. This would cause to move files from the larger to the smaller and it might get full in the process. But I don't think we have a way to prevent that, if we want the migration to be automatic.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt · 2026-03-23T21:29:36Z

Let's go @ngxson @ggerganov ?

ggerganov · 2026-03-24T06:37:53Z

I think we should add some prominent notification on the top of the README.md about this change - I expect there to be some level of confusion from the migration.

angt · 2026-03-24T06:56:07Z

Do you want a bigger WARNING with an explanation when the files are migrated the first time ?

ggerganov · 2026-03-24T07:09:32Z

I guess a warning in the logs might be useful too.

angt · 2026-03-24T08:09:23Z

see #20935

CISC · 2026-03-24T08:44:52Z

https://github.com/ggml-org/llama.cpp/actions/runs/23476321133/job/68309759940

angt · 2026-03-24T09:16:43Z

https://github.com/ggml-org/llama.cpp/actions/runs/23476321133/job/68309759940

This was not tested in the PR ?

CISC · 2026-03-24T09:23:16Z

https://github.com/ggml-org/llama.cpp/actions/runs/23476321133/job/68309759940

This was not tested in the PR ?

Sanitizer jobs (and server-metal, which is also generally failing now) are manual outside of master since #20546

angt · 2026-03-24T09:26:14Z

Thanks, i missed that.

angt · 2026-03-24T09:59:19Z

see #20946

wbste · 2026-03-25T14:40:01Z

Yikes! Anyway to disable this or hide models from HF_HOME? I'm seeing EVERYTHING in my cache, even stuff llama.cpp can't run (Safetensors, Gitattributes, etc...). I purposely have my gguf files in another location so I don't mix HF stuff with llama.cpp. Thoughts on how to handle?

WhyNotHugo · 2026-04-01T21:52:47Z

Regresses: #21280

* common : add standard Hugging Face cache support - Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check with the quant tag Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Cleanup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Improve error handling and report API errors Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Restore common_cached_model_info and align mmproj filtering Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Prefer main when getting cached ref Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use cached files when HF API fails Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use final_path.. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check all inputs Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>

…maBarn (#2453) Move llama.cpp and HuggingFaceModelDownloader under a new Applications table, add LlamaBarn, and replace the "Work in progress" note for llama.cpp with a link to ggml-org/llama.cpp#20775 which added standard Hugging Face cache support. Co-authored-by: julien-agent <Agents+cyolo@huggingface.co>

* common : add standard Hugging Face cache support - Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check with the quant tag Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Cleanup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Improve error handling and report API errors Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Restore common_cached_model_info and align mmproj filtering Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Prefer main when getting cached ref Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use cached files when HF API fails Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use final_path.. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check all inputs Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt requested a review from a team as a code owner March 19, 2026 21:10

github-actions Bot added the examples label Mar 19, 2026

angt force-pushed the common-add-standard-hugging-face-cache-support branch 7 times, most recently from 3638b70 to 77ff285 Compare March 20, 2026 07:36

common : add standard Hugging Face cache support

6fd16ba

- Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt force-pushed the common-add-standard-hugging-face-cache-support branch from 77ff285 to 6fd16ba Compare March 20, 2026 14:11

ggerganov reviewed Mar 20, 2026

View reviewed changes

Comment thread common/hf-cache.h Outdated

Comment thread common/download.h Outdated

Comment thread common/download.cpp Outdated

angt added 2 commits March 20, 2026 17:25

Check with the quant tag

e915644

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Cleanup

b6c7bcf

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt requested a review from a team as a code owner March 20, 2026 18:06

github-actions Bot added python python script changes server labels Mar 20, 2026

ggerganov approved these changes Mar 20, 2026

View reviewed changes

ggerganov requested a review from ngxson March 20, 2026 18:45

Improve error handling and report API errors

e404f6a

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ngxson reviewed Mar 20, 2026

View reviewed changes

Comment thread common/download.h Outdated

Comment thread common/download.cpp

loci-dev mentioned this pull request Mar 21, 2026

UPSTREAM PR #20775: common : add standard Hugging Face cache support auroralabs-loci/llama.cpp#1278

Open

angt force-pushed the common-add-standard-hugging-face-cache-support branch 2 times, most recently from 62bcccb to 5d0c722 Compare March 22, 2026 09:02

Restore common_cached_model_info and align mmproj filtering

77fa9a9

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ngxson reviewed Mar 22, 2026

View reviewed changes

Comment thread common/hf-cache.cpp

Comment thread common/download.h

Comment thread common/hf-cache.cpp

Check all inputs

3645fee

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ngxson approved these changes Mar 23, 2026

View reviewed changes

angt merged commit 8c7957c into ggml-org:master Mar 24, 2026
51 checks passed

wbste mentioned this pull request Mar 25, 2026

Misc. bug: HF_HOME Huggingface cache lists all content and LLAMA_CACHE non-functional #20994

Closed

bachp mentioned this pull request Mar 25, 2026

Misc. bug: Unable to start llama-server as systemd service without specifying huggingface cache options #20952

Closed

Beinsezii mentioned this pull request Mar 26, 2026

Misc. bug: New HF Cache Picks imatrix GGUFs #21014

Closed

hmblair mentioned this pull request Mar 26, 2026

Misc. bug: llama-server fails to load multi-shard GGUF from HF cache (selects wrong shard) #21016

Closed

fanshi1028 mentioned this pull request Apr 5, 2026

Misc. bug: Or feature? HF_HUB_CACHE doesn't work. #21456

Closed

danchev mentioned this pull request Apr 7, 2026

Feature Request: tool to list and delete cached models #16393

Open

4 tasks

julien-c mentioned this pull request May 4, 2026

docs(local-cache): split table into Libraries / Applications, add LlamaBarn huggingface/hub-docs#2453

Merged

1 task

Conversation

angt commented Mar 19, 2026

Uh oh!

angt commented Mar 19, 2026

Uh oh!

ggerganov commented Mar 20, 2026

Uh oh!

angt commented Mar 20, 2026

Uh oh!

ggerganov commented Mar 20, 2026

Uh oh!

julien-c commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

angt commented Mar 23, 2026

Uh oh!

Uh oh!

ggerganov commented Mar 24, 2026

Uh oh!

angt commented Mar 24, 2026

Uh oh!

ggerganov commented Mar 24, 2026

Uh oh!

angt commented Mar 24, 2026

Uh oh!

CISC commented Mar 24, 2026

Uh oh!

angt commented Mar 24, 2026

Uh oh!

CISC commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angt commented Mar 24, 2026

Uh oh!

angt commented Mar 24, 2026

Uh oh!

wbste commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WhyNotHugo commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

CISC commented Mar 24, 2026 •

edited

Loading

wbste commented Mar 25, 2026 •

edited

Loading