Skip to content

Add Q1_0 as new GGUF type#2077

Merged
mishig25 merged 2 commits into
huggingface:mainfrom
PrismML-Eng:q1-hf
Apr 8, 2026
Merged

Add Q1_0 as new GGUF type#2077
mishig25 merged 2 commits into
huggingface:mainfrom
PrismML-Eng:q1-hf

Conversation

@khosravipasha

@khosravipasha khosravipasha commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

This is to add support to Q1_0 newly added GGUF type.
We just released 3 models in this format as 1-bit Bonsai (see more info here)

And this PR is to show correct naming on the hugging-face gguf website tab:

https://huggingface.co/prism-ml/Bonsai-8B-gguf

Screenshot 2026-04-06 at 14 16 31

PR that merged Q1_0: ggml-org/llama.cpp#21273


Note

Low Risk
Low risk: adds a new GGUF quantization enum/value mapping and updates ordering/regex inputs; main risk is mis-numbering or ordering causing incorrect quant label parsing or size calculations.

Overview
Adds support for the newly introduced GGUF Q1_0 1-bit quantization.

This extends quant metadata to include a human-readable description/source link and a bits-per-weight size calculation, and updates the tasks-side GGUF quant enums and quant ordering lists so Q1_0 is recognized when parsing/labeling and when selecting the nearest available quant.

Reviewed by Cursor Bugbot for commit a4007ba. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 492aa15. Configure here.

Comment thread packages/tasks/src/gguf.ts Outdated
@CISC

CISC commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Remember to update your GGUFs filetype to the new enum (maybe possible to do directly in the HF GGUF Editor?).

@khosravipasha

khosravipasha commented Apr 6, 2026

Copy link
Copy Markdown
Contributor Author

Fixed ordering.

@CISC good point, which enum is this based on the on, llama_ftype? https://github.com/ggml-org/llama.cpp/blob/d0a6dfeb28a09831d904fc4d910ddb740da82834/include/llama.h#L116

I can edit and re-upload models probably, don't see an option to edit in the UI.

For this one Q1_0 with group size 128 in our fork which I originally made the models from was 41 general.file_type | 41, in main llama ended up at 40 since we removed the extra type and only went with Q1_0.

@CISC

CISC commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

@CISC good point, which enum is this based on the on, llama_ftype? https://github.com/ggml-org/llama.cpp/blob/d0a6dfeb28a09831d904fc4d910ddb740da82834/include/llama.h#L116

Yes.

I can edit and re-upload models probably, don't see an option to edit in the UI.

If you click the GGUF file there should be an edit link at the top.

@mishig25

mishig25 commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

If you click the GGUF file there should be an edit link at the top.

yep, when you go to https://huggingface.co/prism-ml/Bonsai-8B-gguf/blob/main/Bonsai-8B.gguf, you should see GGUF Editor
image

@mishig25

mishig25 commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

for the hf model page gguf section to work, could you also rename Bonsai-8B.gguf to Bonsai-8B-Q1_0.gguf

hf model page uses a valid gguf type name suffix to present the available gguf quants on the page. Example: unsloth/GLM-4.7-Flash-GGUF

https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/tree/main
image image

@khosravipasha

khosravipasha commented Apr 7, 2026

Copy link
Copy Markdown
Contributor Author

Oh nice good feature for gguf editor, had to go the gguf itself was looking somewhere else. Editted to 40 for all 3 Q1_0 ggufs we have.

For adding Q1_0 suffix Bonsai-8B-Q1_0.gguf is that required? We only have one format at the moment.
Model has been downloaded a lot already and some apps using it so don't want break things. I can add the correct suffix with our next releases.

Oh so the website might use the suffix name to decide the type? In that case I can try uploading same copy as Bonsai-8B-Q1_0.gguf, something like that. Or rename it and notify people to update their model URL/name.

@julien-c

julien-c commented Apr 7, 2026

Copy link
Copy Markdown
Member

Model has been downloaded a lot already and some apps using it so don't want break things.

if it's just a file rename most clients won't re-download an identical file (at least the HF clients shouldn't – https://huggingface.co/docs/hub/local-cache)

@khosravipasha

Copy link
Copy Markdown
Contributor Author

Thanks, this is good. I will look into renaming the models then. Need to update our demo code and give heads up to few apps that are hosting the model to make sure the file name is not hardcoded. Otherwise should be okay.

There is a Test / Browser CI failing, not sure if issue caused by this PR.

@mishig25

mishig25 commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

one of the benefits of following this "gguf type suffx" naming standard is that: if your repository has multiple quants, it makes it possible to select from your llama-server/ollama/etc

example: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF?local-app=llama.cpp

Screenshot 2026-04-08 at 10 45 54 AM

@mishig25

mishig25 commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

There is a Test / Browser CI failing, not sure if issue caused by this PR.

is not related

@mishig25 mishig25 merged commit e24d628 into huggingface:main Apr 8, 2026
5 of 6 checks passed
@khosravipasha

Copy link
Copy Markdown
Contributor Author

@mishig25 good idea thanks. We might just upload a duplicate file with -Q1_0 suffix to not break current apps that are using the old name. Will do more testing after this changes are deployed.

@mishig25

Copy link
Copy Markdown
Collaborator

@khosravipasha everything is deployed to prod fomr hf side. Once you "upload a duplicate file with -Q1_0 suffix", you will see the image below on your model page 🙌

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants