Skip to content

Loading GGUF support#2

Draft
LysandreJik wants to merge 37 commits intomainfrom
gguf-support
Draft

Loading GGUF support#2
LysandreJik wants to merge 37 commits intomainfrom
gguf-support

Conversation

@LysandreJik
Copy link
Owner

WIP

Copy link
Owner Author

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good first PR! Let's clean it up a bit, I want to take a look at the changes in the from_pretrained method to clean things up a bit as its currently doing a lot of changes in several places

Also I wonder if we can't change the loading methods to also only return the metadata and not tensors in some situations. As that read is done sequentially, and the config and tokenizer only need metadata, we could save a bunch of time by not requiring tensors load

Comment on lines 1402 to 1405
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rename that/make it clearer

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These modifications should live under integrations as well

Comment on lines 1410 to 1411
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make that better as well

Comment on lines 176 to 201
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(younes) IMO this and the method below are doing basically the same thing as _gguf_parse_value and load_gguf_checkpoint_in_pytorch_model.

We should clean that up

Comment on lines 3141 to 3172
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would likely send that all to another method to take care of that to not add too much code to the already bloated from_pretrained

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Done !

Comment on lines 3224 to 3196
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the file is loaded in a pt state dict, quantization cannot be applied?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(happy to not support that for now, indeed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be complicated as we would only support the quant schemes that do not require data calibration and would require many patches and if/else checks everywhere :/

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

berk

Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
@LysandreJik LysandreJik changed the base branch from main to rename_ex_file April 19, 2024 15:41
@LysandreJik LysandreJik changed the base branch from rename_ex_file to main April 19, 2024 15:41
@99991
Copy link

99991 commented Apr 22, 2024

I'm happy to see my code live in 🤗 transformers!

I added support for Q2_K, Q3_K and Q5_K yesterday. Feel free to copy as well.

99991/pygguf@a417edb

LysandreJik pushed a commit that referenced this pull request Feb 20, 2025
* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update readme

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix requires gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format again

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gptqmodel version (huggingface#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (huggingface#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (huggingface#7)

* fix format and tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device mismatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix result check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* review: update docs (huggingface#10)

* review: update docs (huggingface#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update document (huggingface#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
LysandreJik pushed a commit that referenced this pull request Feb 20, 2025
* Resolve vptq conflict

* Rename spqr package to spqr_quant

* Get rid of aqlm mention

* Start working on tests

* Resolve ruff code checks

* Ruff format

* Isort

* Test updates

* Add gpu tag

* Rename to modules_to_not_convert

* Config update

* Docs and config update

* Docs and config update

* Update to update_torch_dtype

* spqr config parameter validation

* Ruff update

* Apply ruff fixes

* Test fixes

* Ruff update

* Mark tests as @slow again; Ruff; Docstring update

* Ruff

* Remove absolute path

* Resolve typo

* Remove redundandt log

* Check accelerate/spqr availability

* Ruff fix

* Check if the config contains proper shapes

* Ruff test

* Documentation update

* overview update

* Ruff checks

* Ruff code quality

* Make style

* Update docs/source/en/quantization/spqr.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update spqr.md

* Enable gptqmodel (huggingface#35012)

* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update readme

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix requires gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format again

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gptqmodel version (huggingface#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (huggingface#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (huggingface#7)

* fix format and tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device mismatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix result check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* review: update docs (huggingface#10)

* review: update docs (huggingface#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update document (huggingface#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix : Nemotron Processor in GGUF conversion (huggingface#35708)

* fixing nemotron processor

* make style

* Update docs/source/en/quantization/spqr.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add missing TOC to doc

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants