Skip to content

Add model metadata loading from huggingface for use with tests requiring real model data#19796

Merged
CISC merged 6 commits intoggml-org:masterfrom
bartowski1182:model-data
Feb 28, 2026
Merged

Add model metadata loading from huggingface for use with tests requiring real model data#19796
CISC merged 6 commits intoggml-org:masterfrom
bartowski1182:model-data

Conversation

@bartowski1182
Copy link
Copy Markdown
Contributor

@bartowski1182 bartowski1182 commented Feb 22, 2026

This is based on the work from huggingface here:

https://github.com/huggingface/huggingface.js/tree/main/packages/gguf

Idea is to partially load GGUF models from huggingface, just enough to get the metadata

The intention is to use this data with realistic unit tests for llama-quant.cpp, but it can be used for anyone needing real model data

To build:

cmake --build build --target test-gguf-model-data

To run the included test:

./build/bin/test-gguf-model-data
=== test-gguf-model-data ===
gguf_fetch: downloading 2097152 bytes from Qwen3-0.6B-Q8_0.gguf
gguf_fetch: downloading 4194304 bytes from Qwen3-0.6B-Q8_0.gguf
gguf_fetch: downloading 8388608 bytes from Qwen3-0.6B-Q8_0.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_Qwen3-0.6B-GGUF--Qwen3-0.6B-Q8_0.gguf.partial, 8388608 bytes)
Architecture: qwen3
n_embd:       1024
n_ff:         3072
n_vocab:      151936
n_layer:      28
n_head:       16
n_head_kv:    8
n_expert:     0
n_embd_head_k:128
n_embd_head_v:128
tensors:      311
gguf_fetch: loaded from cache: /home/colin/.cache/llama.cpp/gguf-headers//ggml-org_Qwen3-0.6B-GGUF--Qwen3-0.6B-Q8_0.gguf.partial
gguf_fetch: downloading 2097152 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: downloading 4194304 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: downloading 8388608 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: downloading 16777216 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_GLM-4.6V-GGUF--GLM-4.6V-Q8_0-00001-of-00003.gguf.partial, 16777216 bytes)
gguf_fetch: split model with 3 shards, fetching remaining 2...
gguf_fetch: downloading 2097152 bytes from GLM-4.6V-Q8_0-00002-of-00003.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_GLM-4.6V-GGUF--GLM-4.6V-Q8_0-00002-of-00003.gguf.partial, 2097152 bytes)
gguf_fetch: downloading 2097152 bytes from GLM-4.6V-Q8_0-00003-of-00003.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_GLM-4.6V-GGUF--GLM-4.6V-Q8_0-00003-of-00003.gguf.partial, 2097152 bytes)
Architecture: glm4moe
n_embd:       4096
n_ff:         10944
n_vocab:      151552
n_layer:      46
n_head:       96
n_head_kv:    8
n_expert:     128
n_embd_head_k:128
n_embd_head_v:128
tensors:      780
=== ALL TESTS PASSED ===

Caches the model locally for faster subsequent usages

Unit test provided as an example for usage in future tests

AI was used to help with the porting process and writing the unit tests

@github-actions github-actions bot added the testing Everything test related label Feb 22, 2026
@bartowski1182
Copy link
Copy Markdown
Contributor Author

@JohannesGaessler

Based on your comment here: #19378 (comment)

you may be interested in this

@JohannesGaessler
Copy link
Copy Markdown
Contributor

My PoC is in #19802 . I think it will be sufficient to test the model arch (with/without optional tensors) since that will completely determine how the compute graphs are constructed. If at all possible I want these tests to run offline and with toy data and to require minimal resources. The only thing that I wouldn't be covering is tensors of a specific size where I think it makes more sense to utilize test-backend-ops.

@bartowski1182
Copy link
Copy Markdown
Contributor Author

Ah okay nice!

I'll take a look in the morning to see if I can use that for my use case as well

I do care about tensor sizes just because of the fallback tensors, for completeness and all, but maybe I can find a way around that

@bartowski1182
Copy link
Copy Markdown
Contributor Author

Noticed this broke when the model had an mmproj or was split, should be fixed now, grabs metadata from each of the split files to build the full tensor list, confirmed working with ggml-org/GLM-4.6V-GGUF (and added to test case)

Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, the only thing I'm a little worried about is the overhead of yet another test requiring internet connectivity to the CI which already fails sometimes due to network errors. @CISC maybe you have an opinion here?

@bartowski1182
Copy link
Copy Markdown
Contributor Author

Is there any way I can remove it from CI? the test here is a bit more of a POC, it's nice to have, but not strictly needed, and will serve as a guideline for using the model loading for future tests.

with that said, since I'll be using this for a future test that I want to add, it may be worth considering the network impact.. I can also remove those from CI when I get around to them so they're only run by the people opening up PRs in the affected area?

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 25, 2026

It's not part of any CI as it's under the label model, which is not run.

@bartowski1182
Copy link
Copy Markdown
Contributor Author

Ah great! In that case let me know if you need anything updated, otherwise would love to get this merged so I can start work on some new test coverage :)

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 25, 2026

Ah great! In that case let me know if you need anything updated, otherwise would love to get this merged so I can start work on some new test coverage :)

As it is right now, it's non-functional if SSL support is not compiled in, so unless there's a plan to make it work with local files I think you should just remove all the conditionals in the code and instead build it conditionally on CPPHTTPLIB_OPENSSL_SUPPORT instead of cpp-httplib target.

@bartowski1182
Copy link
Copy Markdown
Contributor Author

As it is right now, it's non-functional if SSL support is not compiled in, so unless there's a plan to make it work with local files I think you should just remove all the conditionals in the code and instead build it conditionally on CPPHTTPLIB_OPENSSL_SUPPORT instead of cpp-httplib target.

hmm that's a good point.. it's half-way to working with local models (because of the caching it does), but I think for now I'll make it build conditionally and leave the code as-is and look to improve it with explicit local support later

@bartowski1182
Copy link
Copy Markdown
Contributor Author

bartowski1182 commented Feb 25, 2026

Okay had to get funky with the CMakeLists.txt (TIL about get_target_property) but it accomplishes what we're after, it now only compiles the gguf-model-data when SSL support is present

Tested with

cmake -B build-nossl -DLLAMA_OPENSSL=OFF

and when building all targets, the test-gguf-model-data binary no longer appears

Everything still works when compiled with SSL support as expected

@bartowski1182
Copy link
Copy Markdown
Contributor Author

@CISC if you could do the honours (assuming you're happy with the compile change)? :)

Copy link
Copy Markdown
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have formatting nits, there are quite a few of these. :)

@bartowski1182
Copy link
Copy Markdown
Contributor Author

oh man, sorry you had to go and do all those by hand D:

is there a preferred C++ formatter/linter so I can avoid causing you all that work in the future??

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 26, 2026

oh man, sorry you had to go and do all those by hand D:

LOL, I only did a few as an example. :)

is there a preferred C++ formatter/linter so I can avoid causing you all that work in the future??

clang-format, though it is not perfectly aligned with our preferred formatting, so sometimes you will have to do manual adjustments, see coding guidelines, though the important part is
Try to follow the existing patterns in the code (indentation, spaces, etc.).

@bartowski1182
Copy link
Copy Markdown
Contributor Author

bartowski1182 commented Feb 26, 2026

LOL, I only did a few as an example. :)

Still too many, I don't like wasting your time ;P

Think I got them all..

I'll take a closer look at the coding guidelines, most of the C++ I've looked at in this repo is from llama-quants.cpp where the single-line if and for statements are quite common (tbh I prefer the multi-line ones)

I'll grab that formatter for all my future contributions :)

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 26, 2026

LOL, I only did a few as an example. :)

Still too many, I don't like wasting your time ;P

No worries.

Think I got them all..

Yep, looks good, thanks!

I'll take a closer look at the coding guidelines, most of the C++ I've looked at in this repo is from llama-quants.cpp where the single-line if and for statements are quite common (tbh I prefer the multi-line ones)

Yeah, probably the worst one to use as an example. :D

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 26, 2026

It's not part of any CI as it's under the label model, which is not run.

Actually, I'm wrong, these tests are actually run on non-low-perf ggml-ci runners.

@bartowski1182
Copy link
Copy Markdown
Contributor Author

I could change it to metadata perhaps? Assuming it's trivial to add new labels

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 26, 2026

I could change it to metadata perhaps? Assuming it's trivial to add new labels

I think it's fine, these runners are supposed to do tests like this, and it's not that much data anyway.

Edit: Also, it looks like mostall of them have openssl disabled, so it won't run anyway.

@CISC CISC merged commit d979f2b into ggml-org:master Feb 28, 2026
77 of 78 checks passed
bartowski1182 added a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
* Add model metadata loading from huggingface for use with other tests

* Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails

* Add support for split models, load metadata from each individual split file, also avoid mmproj

* Code cleanup, revert incremental downloading

* Only compile when cpp-httplib has SSL support

* Fix formatting
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
* Add model metadata loading from huggingface for use with other tests

* Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails

* Add support for split models, load metadata from each individual split file, also avoid mmproj

* Code cleanup, revert incremental downloading

* Only compile when cpp-httplib has SSL support

* Fix formatting
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* Add model metadata loading from huggingface for use with other tests

* Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails

* Add support for split models, load metadata from each individual split file, also avoid mmproj

* Code cleanup, revert incremental downloading

* Only compile when cpp-httplib has SSL support

* Fix formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants