Add model metadata loading from huggingface for use with tests requiring real model data by bartowski1182 · Pull Request #19796 · ggml-org/llama.cpp

bartowski1182 · 2026-02-22T05:07:59Z

This is based on the work from huggingface here:

https://github.com/huggingface/huggingface.js/tree/main/packages/gguf

Idea is to partially load GGUF models from huggingface, just enough to get the metadata

The intention is to use this data with realistic unit tests for llama-quant.cpp, but it can be used for anyone needing real model data

To build:

cmake --build build --target test-gguf-model-data

To run the included test:

./build/bin/test-gguf-model-data
=== test-gguf-model-data ===
gguf_fetch: downloading 2097152 bytes from Qwen3-0.6B-Q8_0.gguf
gguf_fetch: downloading 4194304 bytes from Qwen3-0.6B-Q8_0.gguf
gguf_fetch: downloading 8388608 bytes from Qwen3-0.6B-Q8_0.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_Qwen3-0.6B-GGUF--Qwen3-0.6B-Q8_0.gguf.partial, 8388608 bytes)
Architecture: qwen3
n_embd:       1024
n_ff:         3072
n_vocab:      151936
n_layer:      28
n_head:       16
n_head_kv:    8
n_expert:     0
n_embd_head_k:128
n_embd_head_v:128
tensors:      311
gguf_fetch: loaded from cache: /home/colin/.cache/llama.cpp/gguf-headers//ggml-org_Qwen3-0.6B-GGUF--Qwen3-0.6B-Q8_0.gguf.partial
gguf_fetch: downloading 2097152 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: downloading 4194304 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: downloading 8388608 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: downloading 16777216 bytes from GLM-4.6V-Q8_0-00001-of-00003.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_GLM-4.6V-GGUF--GLM-4.6V-Q8_0-00001-of-00003.gguf.partial, 16777216 bytes)
gguf_fetch: split model with 3 shards, fetching remaining 2...
gguf_fetch: downloading 2097152 bytes from GLM-4.6V-Q8_0-00002-of-00003.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_GLM-4.6V-GGUF--GLM-4.6V-Q8_0-00002-of-00003.gguf.partial, 2097152 bytes)
gguf_fetch: downloading 2097152 bytes from GLM-4.6V-Q8_0-00003-of-00003.gguf
gguf_fetch: cache write OK (/home/colin/.cache/llama.cpp/gguf-headers//ggml-org_GLM-4.6V-GGUF--GLM-4.6V-Q8_0-00003-of-00003.gguf.partial, 2097152 bytes)
Architecture: glm4moe
n_embd:       4096
n_ff:         10944
n_vocab:      151552
n_layer:      46
n_head:       96
n_head_kv:    8
n_expert:     128
n_embd_head_k:128
n_embd_head_v:128
tensors:      780
=== ALL TESTS PASSED ===

Caches the model locally for faster subsequent usages

Unit test provided as an example for usage in future tests

AI was used to help with the porting process and writing the unit tests

bartowski1182 · 2026-02-22T05:20:33Z

@JohannesGaessler

Based on your comment here: #19378 (comment)

you may be interested in this

JohannesGaessler · 2026-02-22T11:22:18Z

My PoC is in #19802 . I think it will be sufficient to test the model arch (with/without optional tensors) since that will completely determine how the compute graphs are constructed. If at all possible I want these tests to run offline and with toy data and to require minimal resources. The only thing that I wouldn't be covering is tensors of a specific size where I think it makes more sense to utilize test-backend-ops.

bartowski1182 · 2026-02-22T11:28:15Z

Ah okay nice!

I'll take a look in the morning to see if I can use that for my use case as well

I do care about tensor sizes just because of the fallback tensors, for completeness and all, but maybe I can find a way around that

…e and add warning when it fails

…t file, also avoid mmproj

bartowski1182 · 2026-02-23T18:59:18Z

Noticed this broke when the model had an mmproj or was split, should be fixed now, grabs metadata from each of the split files to build the full tensor list, confirmed working with ggml-org/GLM-4.6V-GGUF (and added to test case)

pwilkin

Looks good to me, the only thing I'm a little worried about is the overhead of yet another test requiring internet connectivity to the CI which already fails sometimes due to network errors. @CISC maybe you have an opinion here?

bartowski1182 · 2026-02-25T20:27:36Z

Is there any way I can remove it from CI? the test here is a bit more of a POC, it's nice to have, but not strictly needed, and will serve as a guideline for using the model loading for future tests.

with that said, since I'll be using this for a future test that I want to add, it may be worth considering the network impact.. I can also remove those from CI when I get around to them so they're only run by the people opening up PRs in the affected area?

CISC · 2026-02-25T20:30:53Z

~~It's not part of any CI as it's under the label model, which is not run.~~

bartowski1182 · 2026-02-25T20:44:23Z

Ah great! In that case let me know if you need anything updated, otherwise would love to get this merged so I can start work on some new test coverage :)

CISC · 2026-02-25T21:32:02Z

Ah great! In that case let me know if you need anything updated, otherwise would love to get this merged so I can start work on some new test coverage :)

As it is right now, it's non-functional if SSL support is not compiled in, so unless there's a plan to make it work with local files I think you should just remove all the conditionals in the code and instead build it conditionally on CPPHTTPLIB_OPENSSL_SUPPORT instead of cpp-httplib target.

bartowski1182 · 2026-02-25T21:34:46Z

As it is right now, it's non-functional if SSL support is not compiled in, so unless there's a plan to make it work with local files I think you should just remove all the conditionals in the code and instead build it conditionally on CPPHTTPLIB_OPENSSL_SUPPORT instead of cpp-httplib target.

hmm that's a good point.. it's half-way to working with local models (because of the caching it does), but I think for now I'll make it build conditionally and leave the code as-is and look to improve it with explicit local support later

bartowski1182 · 2026-02-25T21:53:17Z

Okay had to get funky with the CMakeLists.txt (TIL about get_target_property) but it accomplishes what we're after, it now only compiles the gguf-model-data when SSL support is present

Tested with

cmake -B build-nossl -DLLAMA_OPENSSL=OFF

and when building all targets, the test-gguf-model-data binary no longer appears

Everything still works when compiled with SSL support as expected

bartowski1182 · 2026-02-26T19:46:53Z

@CISC if you could do the honours (assuming you're happy with the compile change)? :)

CISC

Sorry, I have formatting nits, there are quite a few of these. :)

tests/gguf-model-data.cpp

tests/gguf-model-data.h

tests/test-gguf-model-data.cpp

bartowski1182 · 2026-02-26T21:51:38Z

oh man, sorry you had to go and do all those by hand D:

is there a preferred C++ formatter/linter so I can avoid causing you all that work in the future??

CISC · 2026-02-26T21:59:35Z

oh man, sorry you had to go and do all those by hand D:

LOL, I only did a few as an example. :)

is there a preferred C++ formatter/linter so I can avoid causing you all that work in the future??

clang-format, though it is not perfectly aligned with our preferred formatting, so sometimes you will have to do manual adjustments, see coding guidelines, though the important part is
Try to follow the existing patterns in the code (indentation, spaces, etc.).

bartowski1182 · 2026-02-26T22:05:40Z

LOL, I only did a few as an example. :)

Still too many, I don't like wasting your time ;P

Think I got them all..

I'll take a closer look at the coding guidelines, most of the C++ I've looked at in this repo is from llama-quants.cpp where the single-line if and for statements are quite common (tbh I prefer the multi-line ones)

I'll grab that formatter for all my future contributions :)

CISC · 2026-02-26T22:12:34Z

LOL, I only did a few as an example. :)

Still too many, I don't like wasting your time ;P

No worries.

Think I got them all..

Yep, looks good, thanks!

I'll take a closer look at the coding guidelines, most of the C++ I've looked at in this repo is from llama-quants.cpp where the single-line if and for statements are quite common (tbh I prefer the multi-line ones)

Yeah, probably the worst one to use as an example. :D

CISC · 2026-02-26T22:19:44Z

It's not part of any CI as it's under the label model, which is not run.

Actually, I'm wrong, these tests are actually run on non-low-perf ggml-ci runners.

bartowski1182 · 2026-02-26T22:21:48Z

I could change it to metadata perhaps? Assuming it's trivial to add new labels

CISC · 2026-02-26T22:22:50Z

I could change it to metadata perhaps? Assuming it's trivial to add new labels

I think it's fine, these runners are supposed to do tests like this, and it's not that much data anyway.

Edit: Also, it looks like ~~most~~all of them have openssl disabled, so it won't run anyway.

* Add model metadata loading from huggingface for use with other tests * Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails * Add support for split models, load metadata from each individual split file, also avoid mmproj * Code cleanup, revert incremental downloading * Only compile when cpp-httplib has SSL support * Fix formatting

Add model metadata loading from huggingface for use with other tests

2cab701

bartowski1182 requested a review from ggerganov as a code owner February 22, 2026 05:08

github-actions bot added the testing Everything test related label Feb 22, 2026

Add incremental chunking instead of full redownload, fix caching issu…

814daeb

…e and add warning when it fails

loci-dev mentioned this pull request Feb 23, 2026

UPSTREAM PR #19796: Add model metadata loading from huggingface for use with tests requiring real model data auroralabs-loci/llama.cpp#1201

Open

JohannesGaessler mentioned this pull request Feb 23, 2026

llama: end-to-end tests #19802

Merged

Add support for split models, load metadata from each individual spli…

2f73f80

…t file, also avoid mmproj

Code cleanup, revert incremental downloading

14f09f5

0cc4m mentioned this pull request Feb 25, 2026

test-backend-ops: allow loading tests from file and parsing model operators into file #19896

Merged

pwilkin approved these changes Feb 25, 2026

View reviewed changes

Only compile when cpp-httplib has SSL support

bc86c8e

CISC reviewed Feb 26, 2026

View reviewed changes

Fix formatting

5167565

CISC merged commit d979f2b into ggml-org:master Feb 28, 2026
77 of 78 checks passed

0cc4m mentioned this pull request Mar 30, 2026

tests: allow exporting graph ops from HF file without downloading weights #21182

Merged

Conversation

bartowski1182 commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Feb 22, 2026

Uh oh!

JohannesGaessler commented Feb 22, 2026

Uh oh!

bartowski1182 commented Feb 22, 2026

Uh oh!

bartowski1182 commented Feb 23, 2026

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

bartowski1182 commented Feb 25, 2026

Uh oh!

CISC commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Feb 25, 2026

Uh oh!

CISC commented Feb 25, 2026

Uh oh!

bartowski1182 commented Feb 25, 2026

Uh oh!

bartowski1182 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Feb 26, 2026

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bartowski1182 commented Feb 26, 2026

Uh oh!

CISC commented Feb 26, 2026

Uh oh!

bartowski1182 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Feb 26, 2026

Uh oh!

CISC commented Feb 26, 2026

Uh oh!

bartowski1182 commented Feb 26, 2026

Uh oh!

CISC commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bartowski1182 commented Feb 22, 2026 •

edited

Loading

CISC commented Feb 25, 2026 •

edited

Loading

bartowski1182 commented Feb 25, 2026 •

edited

Loading

bartowski1182 commented Feb 26, 2026 •

edited

Loading

CISC commented Feb 26, 2026 •

edited

Loading