Add model metadata loading from huggingface for use with tests requiring real model data#19796
Add model metadata loading from huggingface for use with tests requiring real model data#19796CISC merged 6 commits intoggml-org:masterfrom
Conversation
|
Based on your comment here: #19378 (comment) you may be interested in this |
|
My PoC is in #19802 . I think it will be sufficient to test the model arch (with/without optional tensors) since that will completely determine how the compute graphs are constructed. If at all possible I want these tests to run offline and with toy data and to require minimal resources. The only thing that I wouldn't be covering is tensors of a specific size where I think it makes more sense to utilize |
|
Ah okay nice! I'll take a look in the morning to see if I can use that for my use case as well I do care about tensor sizes just because of the fallback tensors, for completeness and all, but maybe I can find a way around that |
…e and add warning when it fails
…t file, also avoid mmproj
|
Noticed this broke when the model had an mmproj or was split, should be fixed now, grabs metadata from each of the split files to build the full tensor list, confirmed working with ggml-org/GLM-4.6V-GGUF (and added to test case) |
|
Is there any way I can remove it from CI? the test here is a bit more of a POC, it's nice to have, but not strictly needed, and will serve as a guideline for using the model loading for future tests. with that said, since I'll be using this for a future test that I want to add, it may be worth considering the network impact.. I can also remove those from CI when I get around to them so they're only run by the people opening up PRs in the affected area? |
|
|
|
Ah great! In that case let me know if you need anything updated, otherwise would love to get this merged so I can start work on some new test coverage :) |
As it is right now, it's non-functional if SSL support is not compiled in, so unless there's a plan to make it work with local files I think you should just remove all the conditionals in the code and instead build it conditionally on |
hmm that's a good point.. it's half-way to working with local models (because of the caching it does), but I think for now I'll make it build conditionally and leave the code as-is and look to improve it with explicit local support later |
|
Okay had to get funky with the CMakeLists.txt (TIL about Tested with and when building all targets, the Everything still works when compiled with SSL support as expected |
|
@CISC if you could do the honours (assuming you're happy with the compile change)? :) |
CISC
left a comment
There was a problem hiding this comment.
Sorry, I have formatting nits, there are quite a few of these. :)
|
oh man, sorry you had to go and do all those by hand D: is there a preferred C++ formatter/linter so I can avoid causing you all that work in the future?? |
LOL, I only did a few as an example. :)
|
Still too many, I don't like wasting your time ;P Think I got them all.. I'll take a closer look at the coding guidelines, most of the C++ I've looked at in this repo is from llama-quants.cpp where the single-line I'll grab that formatter for all my future contributions :) |
No worries.
Yep, looks good, thanks!
Yeah, probably the worst one to use as an example. :D |
Actually, I'm wrong, these tests are actually run on non-low-perf |
|
I could change it to |
I think it's fine, these runners are supposed to do tests like this, and it's not that much data anyway. Edit: Also, it looks like |
* Add model metadata loading from huggingface for use with other tests * Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails * Add support for split models, load metadata from each individual split file, also avoid mmproj * Code cleanup, revert incremental downloading * Only compile when cpp-httplib has SSL support * Fix formatting
* Add model metadata loading from huggingface for use with other tests * Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails * Add support for split models, load metadata from each individual split file, also avoid mmproj * Code cleanup, revert incremental downloading * Only compile when cpp-httplib has SSL support * Fix formatting
* Add model metadata loading from huggingface for use with other tests * Add incremental chunking instead of full redownload, fix caching issue and add warning when it fails * Add support for split models, load metadata from each individual split file, also avoid mmproj * Code cleanup, revert incremental downloading * Only compile when cpp-httplib has SSL support * Fix formatting
This is based on the work from huggingface here:
https://github.com/huggingface/huggingface.js/tree/main/packages/gguf
Idea is to partially load GGUF models from huggingface, just enough to get the metadata
The intention is to use this data with realistic unit tests for llama-quant.cpp, but it can be used for anyone needing real model data
To build:
To run the included test:
Caches the model locally for faster subsequent usages
Unit test provided as an example for usage in future tests
AI was used to help with the porting process and writing the unit tests