-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
Description
Describe the bug
When padding, the output of the transformer kernel is different from the output of huggingface's BertLayer
To Reproduce
ds_layer = DeepSpeedTransformerLayer(ds_config, all_weight, all_bias).cuda()
bert_layer = BertLayer(bert_config).cuda()
data = torch.rand((batch_size, seq_length, hidden_size), dtype=torch.float32).cuda()
mask = torch.ones((batch_size, 1, 1, seq_length), dtype=torch.float32) * -10000
mask[:, :, :, : seq_length // 2] = 0.0
# mask[:, :, :, : seq_length] = 0.0 # this make the output is basically the same
mask = mask.cuda()
if fp16:
data = data.half()
ds_layer = ds_layer.half()
bert_layer = bert_layer.half()
ds_output = ds_layer(data, mask)
bert_output = bert_layer(data, mask)
max_diff = torch.max(torch.abs(bert_output[0] - ds_output))
mean_diff = torch.mean(torch.abs(bert_output[0] - ds_output))
print(f"max_diff: {max_diff}")
print(f"mean_diff: {mean_diff}")
output:
max_diff: 0.09228515625
mean_diff: 0.0198822021484375
Expected behavior
results aligned with huggingface while do padding
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [YES] ...... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [YES] ...... [OKAY]
stochastic_transformer . [YES] ...... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [YES] ...... [NO]
transformer_inference .. [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/anaconda3/lib/python3.7/site-packages/torch']
torch version .................... 1.10.0a0+git36449ea
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/usr/local/anaconda3/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.5.8, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.10, cuda 11.1