Skip to content

[Feature request] Support GPTQ quantization #39

@araleza

Description

@araleza

So I have a GPTQ llama model I downloaded (from TheBloke), and it's already 4 bit quantized. I have to pass in False for the load_in_4bit parameter of:

model, tokenizer = FastLlamaModel.from_pretrained(

because if I don't, I get an error thrown saying:

The model is already quantized with gptq. You can't quantize it again with bitsandbytes

But, if I pass in False for load_in_4bit, this code makes bnb_config be None:

        bnb_config = None
        if load_in_4bit:
            bnb_config = BitsAndBytesConfig(
                load_in_4bit              = True,
                bnb_4bit_use_double_quant = True,
                bnb_4bit_quant_type       = "nf4",
                bnb_4bit_compute_dtype    = dtype,
            )

and that makes quantization_config be None as well:

quantization_config = bnb_config,

and that crashes here:

        if hasattr(self, "quantization_config"):
            output["quantization_config"] = (
                self.quantization_config.to_dict()

with the error message:

'NoneType' object has no attribute 'to_dict'

So I'm not sure how to LoRA train this llama model. Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestFeature request pending on roadmaphelp wantedHelp from the OSS community wanted!on roadmapFeature request on roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions