Hi Tim,
Thanks for your awesome work!
I'm using your method to load the largest BLOOM model (the BLOOM model with 176b parameters) onto 1 node with 8 GPUs.
model = AutoModelForCausalLM.from_pretrained(
"bloom",
device_map="auto",
load_in_8bit=True,
)
This line works for all the other smaller bloom models, eg. bloom-7b1. However when loading bloom (176b) I got error "8-bit operations on bitsandbytes are not supported under CPU!".
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2182, in from_pretrained
raise ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!")
ValueError: 8-bit operations on `bitsandbytes` are not supported under CPU!
In my understanding, this is because some modules of the model are automatically loaded onto CPU, which didn't happen to the smaller models. Is there a way to force the model to be loaded to GPU only? or do you have any advice on how to bypass this error? Thanks!!
Tianwei
Hi Tim,
Thanks for your awesome work!
I'm using your method to load the largest BLOOM model (the BLOOM model with 176b parameters) onto 1 node with 8 GPUs.
This line works for all the other smaller bloom models, eg. bloom-7b1. However when loading
bloom(176b) I got error"8-bit operations on bitsandbytes are not supported under CPU!".In my understanding, this is because some modules of the model are automatically loaded onto CPU, which didn't happen to the smaller models. Is there a way to force the model to be loaded to GPU only? or do you have any advice on how to bypass this error? Thanks!!
Tianwei