-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Description
System Info
Hi,
I am using a Llama model and wanted to add to pipeline class but it throws me an error when building the pipeline class.
Does anyone have a solution to this?
thank you!
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Model
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='auto',
load_in_8bit=True,
max_memory=max_memory)llm class
class CustomLLM(LLM):
pipeline = pipeline("text-generation",tokenizer = tokenizer, model=model, device="cuda:0") def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str: prompt_length = len(prompt) response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"] # only return newly generated tokens return response[prompt_length:] @property def _identifying_params(self) -> Mapping[str, Any]: return {"name_of_model": self.model_name} @property def _llm_type(self) -> str: return "custom"
" model has already been set to the correct devices and casted to the correct `dtype`."
Expected behavior
1879 # Checks if the model has been loaded in 8-bit
1880 if getattr(self, "is_loaded_in_8bit", False):
-> 1881 raise ValueError(
1882 ".to is not supported for 8-bit models. Please use the model as it is, since the"
1883 " model has already been set to the correct devices and casted to the correct dtype."
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels