Add Q1_0 as new GGUF type#2077
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 492aa15. Configure here.
|
Remember to update your GGUFs filetype to the new enum (maybe possible to do directly in the HF GGUF Editor?). |
|
Fixed ordering. @CISC good point, which enum is this based on the on, I can edit and re-upload models probably, don't see an option to edit in the UI. For this one Q1_0 with group size 128 in our fork which I originally made the models from was 41 |
Yes.
If you click the GGUF file there should be an edit link at the top. |
yep, when you go to https://huggingface.co/prism-ml/Bonsai-8B-gguf/blob/main/Bonsai-8B.gguf, you should see |
|
for the hf model page gguf section to work, could you also rename Bonsai-8B.gguf to hf model page uses a valid gguf type name suffix to present the available gguf quants on the page. Example: unsloth/GLM-4.7-Flash-GGUF
|
|
Oh nice good feature for gguf editor, had to go the gguf itself was looking somewhere else. Editted to 40 for all 3 Q1_0 ggufs we have. For adding Q1_0 suffix Oh so the website might use the suffix name to decide the type? In that case I can try uploading same copy as |
if it's just a file rename most clients won't re-download an identical file (at least the HF clients shouldn't – https://huggingface.co/docs/hub/local-cache) |
|
Thanks, this is good. I will look into renaming the models then. Need to update our demo code and give heads up to few apps that are hosting the model to make sure the file name is not hardcoded. Otherwise should be okay. There is a Test / Browser CI failing, not sure if issue caused by this PR. |
|
one of the benefits of following this "gguf type suffx" naming standard is that: if your repository has multiple quants, it makes it possible to select from your llama-server/ollama/etc example: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF?local-app=llama.cpp
|
is not related |
|
@mishig25 good idea thanks. We might just upload a duplicate file with -Q1_0 suffix to not break current apps that are using the old name. Will do more testing after this changes are deployed. |
|
@khosravipasha everything is deployed to prod fomr hf side. Once you "upload a duplicate file with -Q1_0 suffix", you will see the image below on your model page 🙌
|






This is to add support to Q1_0 newly added GGUF type.
We just released 3 models in this format as 1-bit Bonsai (see more info here)
And this PR is to show correct naming on the hugging-face gguf website tab:
https://huggingface.co/prism-ml/Bonsai-8B-gguf
PR that merged Q1_0: ggml-org/llama.cpp#21273
Note
Low Risk
Low risk: adds a new GGUF quantization enum/value mapping and updates ordering/regex inputs; main risk is mis-numbering or ordering causing incorrect quant label parsing or size calculations.
Overview
Adds support for the newly introduced GGUF
Q1_01-bit quantization.This extends quant metadata to include a human-readable description/source link and a bits-per-weight size calculation, and updates the tasks-side GGUF quant enums and quant ordering lists so
Q1_0is recognized when parsing/labeling and when selecting the nearest available quant.Reviewed by Cursor Bugbot for commit a4007ba. Bugbot is set up for automated code reviews on this repo. Configure here.