Skip to content

[BUG] MP-sharded checkpoint loading does not work for models except BLOOM #2442

@pai4451

Description

@pai4451

Describe the bug

We currently want to run inference on EleutherAI/gpt-j-6B model with tensor parallelism on multiple GPUs, similarly to what BLOOM model does. But it seems the way DeepSpeed inference saves and loads the pre-shared checkpoints are not consistent and general enough for other models.

To Reproduce

I tried using the DeepSpeed inference script for BLOOM and modifying lines 140-141 to

model = GPTJForCausalLM.from_pretrained(
    "EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True
)

and line 100 to

model = deepspeed.init_inference(
    model,
    mp_size=world_size,
    base_dir=repo_root,
    dtype=getattr(torch, infer_dtype),
    save_mp_checkpoint_path =<some path to save mp checkpoint>,
    **kwargs,
)

After the first run, on my 2x A6000 server, I was able to get the tensor parallelism-sharded checkpoints under the path <some path to save mp checkpoint> and a configuration file ds_inference_config.json shown below

{"type": "ds_model",
"base_dir": <some path to save mp checkpoint>, 
"checkpoints": {"non-tp":["non-tp.pt"], "tp":["tp_00_00.pt", "tp_01_00.pt", "tp_00_01.pt", "tp_01_01.pt", 
    "tp_00_02.pt", "tp_01_02.pt", "tp_00_03.pt", "tp_01_03.pt", "tp_00_04.pt", "tp_01_04.pt",
    , "tp_00_05.pt", "tp_01_05.pt", "tp_00_06.pt", "tp_01_06.pt", "tp_01_07.pt", "tp_01_07.pt"]},
"version": 1.0, 
"parallelization": "tp", 
"tp_size": 2,
"dtype": "float16}

For the second round, I undo the changes for lines 140-141 as well as save_mp_checkpoint_path and use checkpoint=<some path to save mp checkpoint>/ds_inference_config.json in deepspeed.init_inference. This is the standard way for loading the preshared model for BLOOM which speeds up the loading process. However, the above code raises the following error

AssertionError: ds_model checkpoint type is not supported

, which comes from the following code that DeepSpeed inference loads the JSON state_dict

https://github.com/microsoft/DeepSpeed/blob/a5248643571581c6ca3b453a3b9d01ac157ac00a/deepspeed/runtime/state_dict_factory.py#L34

https://github.com/microsoft/DeepSpeed/blob/a5248643571581c6ca3b453a3b9d01ac157ac00a/deepspeed/runtime/state_dict_factory.py#L42

I also tried to change the type in ds_inference_config.json to BLOOM, since the only supported format for JSON checkpoints are BLOOM and Megatron, but this time the following line cause error

https://github.com/microsoft/DeepSpeed/blob/a5248643571581c6ca3b453a3b9d01ac157ac00a/deepspeed/module_inject/load_checkpoint.py#L199

AttributeError: 'NoneType' object has no attribute 'is_meta'

Is the preshared checkpoints loading feature only limited to the BLOOM model? How can I use tensor parallelism to split a single model to run on multiple GPUs?

Similar threads:
#2379
#2132

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions