[Feature request] Support GPTQ quantization

So I have a GPTQ llama model I downloaded (from TheBloke), and it's already 4 bit quantized.  I have to pass in False for the load_in_4bit parameter of:
```
model, tokenizer = FastLlamaModel.from_pretrained(
```
because if I don't, I get an error thrown saying:
```
The model is already quantized with gptq. You can't quantize it again with bitsandbytes
```
But, if I pass in False for load_in_4bit, this code makes bnb_config be None:
```
        bnb_config = None
        if load_in_4bit:
            bnb_config = BitsAndBytesConfig(
                load_in_4bit              = True,
                bnb_4bit_use_double_quant = True,
                bnb_4bit_quant_type       = "nf4",
                bnb_4bit_compute_dtype    = dtype,
            )
```
and that makes quantization_config be None as well:
```
quantization_config = bnb_config,
```
and that crashes here:
```
        if hasattr(self, "quantization_config"):
            output["quantization_config"] = (
                self.quantization_config.to_dict()
```
with the error message:
```
'NoneType' object has no attribute 'to_dict'
```
So I'm not sure how to LoRA train this llama model.  Any thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Support GPTQ quantization #39

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature request] Support GPTQ quantization #39

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions