Skip to content

[BUG] transformer kernel: results misaligned with huggingface due to padding #1967

@dancingpipi

Description

@dancingpipi

Describe the bug
When padding, the output of the transformer kernel is different from the output of huggingface's BertLayer

To Reproduce

ds_layer = DeepSpeedTransformerLayer(ds_config, all_weight, all_bias).cuda()
bert_layer = BertLayer(bert_config).cuda()

data = torch.rand((batch_size, seq_length, hidden_size), dtype=torch.float32).cuda()
mask = torch.ones((batch_size, 1, 1, seq_length), dtype=torch.float32) * -10000
mask[:, :, :, : seq_length // 2] = 0.0 
# mask[:, :, :, : seq_length] = 0.0    # this make the output is basically the same
mask = mask.cuda()

if fp16:
    data = data.half()
    ds_layer = ds_layer.half()
    bert_layer = bert_layer.half()

ds_output = ds_layer(data, mask)
bert_output = bert_layer(data, mask)

max_diff = torch.max(torch.abs(bert_output[0] - ds_output))
mean_diff = torch.mean(torch.abs(bert_output[0] - ds_output))
print(f"max_diff: {max_diff}")
print(f"mean_diff: {mean_diff}")

output:
max_diff: 0.09228515625
mean_diff: 0.0198822021484375

Expected behavior
results aligned with huggingface while do padding

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [YES] ...... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [YES] ...... [OKAY]
stochastic_transformer . [YES] ...... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [YES] ...... [NO]
transformer_inference .. [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/anaconda3/lib/python3.7/site-packages/torch']
torch version .................... 1.10.0a0+git36449ea
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/usr/local/anaconda3/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.5.8, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.10, cuda 11.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions