-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Describe the bug
Similar to #2113 this bug relates to garbage output when using multi-gpu inference. In that issue @RezaYazdaniAminabadi made a fix seen in #2198 that fixed a similar issue for GPT Neo 2.7B that after building from master I can confirm solved multi-gpu inference for GPT Neo 2.7B. However, for GPTJ the issue remains:
Output from 2 3090s for GPTJ
[{'generated_text': 'DeepSpeed is,: to,,/ &.. by and.. a\n.. and- and.. the,,\n of\n [.,.\n:, &-. and a- the,\n\n). the'}]
Meanwhile output from 1 3090 for GPTJ
[{'generated_text': 'DeepSpeed is a leading deep learning framework designed for distributed training and inference on heterogeneous accelerators and CPUs. Our paper (https://arxiv.org/abs/1811.11540) describes an optimized deep architecture and inference engine and'}]
To Reproduce
Steps to reproduce the behavior:
- Install DeepSpeed from source on master
- pip install transformers
- Run with 2 GPUs to get bad output
- Run with 1 GPU to get good output
import os
import deepspeed
import torch
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'EleutherAI/gpt-j-6B'
# model_name = "EleutherAI/gpt-neo-2.7B"
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
generator = pipeline('text-generation', model=model_name, device=local_rank,torch_dtype=torch.float16)
generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.half,
replace_method='auto',
replace_with_kernel_inject=True)
string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
Expected behavior
I would expect output that makes sense, like the output for one GPU.
ds_report output
ds_report
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [YES] ...... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [YES] ...... [OKAY]
stochastic_transformer . [YES] ...... [OKAY]
async_io ............... [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [YES] ...... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/root/anaconda3/envs/gpt/lib/python3.9/site-packages/torch']
torch version .................... 1.12.0
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed install path ........... ['/root/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.7.1+7d8ad45, 7d8ad45, master
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3
System info (please complete the following information):
- OS: Ubuntu 20.04
- GPU count and types: 2 3090s
- Interconnects: 1 system, 2 3090s
- Python version: 3.9.13
I am using a docker container with Nvidia Cuda already set up as the base image.
Launcher context
deepspeed --num_gpus 2 infer.py
deepspeed --num_gpus 1 infer.py
Docker context
Are you using a specific docker image that you can share?
nvidia/cuda:11.3.1-devel-ubuntu20.04
then I am building python packages into the container
Additional context
NA