Improve BERT-like models performance with better self attention by jplu · Pull Request #9124 · huggingface/transformers

jplu · 2020-12-15T13:07:04Z

What does this PR do?

This PR updates the way we implement the self attention layers in order to be aligned on the original BERT performance. Small breaking change, this improvement needs at least TF 2.3. This change has already been discussed with @thomwolf, and he agreed. But still needs the approval of @LysandreJik @patrickvonplaten and @sgugger

@patrickvonplaten I have removed the comment for check_copies in the Longformer model because I don't know enough this model to apply the proper changes, I will apply this update to one model by one model for the ones I know but can you take this one?

@jlei2 as I'm on Windows, unfortunately the GPU profiling is not yet available in WSL, can you clone this branch and be sure that everythings works like expected with your benchmark? Thanks!!

Fixes # (issue)

#6771

jplu · 2020-12-15T13:25:02Z

A Python profiling call gives the following improvements:

model = TFBertModel.from_pretrained("bert-base-cased")

# With the improvements
cProfile.run("model(model.dummy_inputs)") 
54591 function calls (53774 primitive calls) in 0.064 seconds

# Currently on master
cProfile.run("model(model.dummy_inputs)")
76166 function calls (75204 primitive calls) in 0.095 seconds

sgugger

Thanks for working on this! I just have some cosmetic change nits, since I'm annoying and can't see things that are longer than 119 chars ;-)

sgugger · 2020-12-15T14:52:37Z

If you rebase after merging #9120, we canc lean up the version test in tf_optimization.

Can we do that in a PR that will take care of only doing TF >=2.3 compliancy, instead?

sgugger · 2020-12-15T14:54:29Z

Why is the assert better than raising the ValueError? I liked the part above better but it's mostly because it fits with the 119 char limits. If we keep the assert, could you just split the line to respect that char limit?

I don't mind to remove the assert and put back raising a value error. Will do this!

Done in the last commit!

sgugger · 2020-12-15T14:58:14Z

Same comment as for the other assert.

Done in the last commit!

sgugger · 2020-12-15T14:59:15Z

Same comment as for the other assert.

Done in the last commit!

sgugger · 2020-12-15T14:59:37Z

Done in the last commit!

sgugger · 2020-12-15T15:04:48Z

Why is Longformer not included in the change?

Because I don't know well enough this model, I prefer to let @patrickvonplaten handle this one.

The intermediate layers of Longformer are 1-to-1 the same than BERT and there should be no problem to keep those lines. I'd be surprised if leaving this line would throw an error tbh. Did you try just leaving them? The only difference in Longformer is the Self-attention layer and all none of those copy statements concern the self-attention layer, so IMO we should leave the statements and run make fix-copies

Oh indeed it works for the Intermediate layer! Only the Self attention still needs to be updated accordingly :)

Yes that I'm happy to do in another PR - would be amazing if you could open a one-liner issue about it and tag me :-)

patrickvonplaten

Awesome! I am sure you already did this, but we before merging we should be sure of two things:

All the slow tf BERT + BERT-like models tests are passing
"Old" pre-trained models tf_model.h5 files that were saved with tf < 2.3 can be loaded into the new layer design + tf1 models (.ckpt) files can be loaded into the new model layer.

I don't really see a reason why neither 1) nor 2) should not work, but just to be sure it'd be great to test those quickly :-)

And I think we can leave all the longformer copy statements -> there shouldn't be a problem :-)

jplu · 2020-12-16T10:13:57Z

Thanks @patrickvonplaten !!

Slow tests are passing for these models
I confirm that "Old" pre-trained models tf_model.h5 files that were saved with tf < 2.3 can be loaded into the new layer design

I haven't tested the tf1 models, you mean testing the load_tf_weights_in_bert in the modeling_bert.py file?

jplu · 2020-12-16T10:47:48Z

@jlei2 has confirmed that now everything works as expected in the profiler and benchmark 👍 #6771 (comment)

patrickvonplaten · 2020-12-16T11:08:01Z

2. "Old" pre-trained models tf_model.h5 files that were saved with tf < 2.3 can be loaded into the new layer des

Yeah I mean loading a tf .ckpt file using the from_pretrained(...) method. The from_pretrained(...) method automatically uses the correct functions to load .ckpt. I think the easiest way would be to download one of the zips of the official google bert: https://github.com/google-research/bert#bert and quickly check that it can be loaded and that the output on this branch and on master is the same.

patrickvonplaten · 2020-12-16T11:23:30Z

"Old" pre-trained models tf_model.h5 files that were saved with tf < 2.3 can be loaded into the new layer des

Yeah I mean loading a tf .ckpt file using the from_pretrained(...) method. The from_pretrained(...) method automatically uses the correct functions to load .ckpt. I think the easiest way would be to download one of the zips of the official google bert: https://github.com/google-research/bert#bert and quickly check that it can be loaded and that the output on this branch and on master is the same.

Ok as discussed offline TF1 checkpoints cannot even be loaded into TF2 at the moment (only if one goes through PT), so this PR is good to go for me!

LysandreJik

This is very clean, and the performance improvements are amazing! Thanks for checking that the slow tests pass and that the previous checkpoints can still be loaded.

Great job, thank you for working on this!

LysandreJik · 2020-12-16T14:27:47Z

Why not use the tf.keras.layers.experimental.EinsumDense and keep the copy?

LysandreJik · 2020-12-16T14:27:52Z

Why not use the tf.keras.layers.experimental.EinsumDense and keep the copy?

This one is not possible because the input/output shapes won't be anymore compatible.

jplu marked this pull request as draft December 15, 2020 13:14

jplu mentioned this pull request Dec 15, 2020

Fix tf2.4 #9120

Merged

sgugger approved these changes Dec 15, 2020

View reviewed changes

jplu force-pushed the tf-einsumdense branch from 885c4fe to 16d4655 Compare December 15, 2020 15:17

jplu marked this pull request as ready for review December 15, 2020 15:17

jplu requested review from LysandreJik, patrickvonplaten and thomwolf December 15, 2020 15:17

patrickvonplaten approved these changes Dec 16, 2020

View reviewed changes

LysandreJik approved these changes Dec 16, 2020

View reviewed changes

jplu added 8 commits December 21, 2020 12:56

Improve BERT-like models attention layers

482c983

Apply style

15ce11d

Put back error raising instead of assert

22c2dd5

Update template

90745ee

Fix copies

38f8b65

Apply raising valueerror in MPNet

15411a2

Restore the copy check for the Intermediate layer in Longformer

906dfd8

Update longformer

385cd06

jplu force-pushed the tf-einsumdense branch from 03418bc to 385cd06 Compare December 21, 2020 11:57

jplu merged commit 5a8a4eb into huggingface:master Dec 21, 2020

jplu deleted the tf-einsumdense branch December 21, 2020 13:16

Conversation

jplu commented Dec 15, 2020 • edited by patrickvonplaten Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

jplu commented Dec 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

jplu commented Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jplu commented Dec 16, 2020

Uh oh!

patrickvonplaten commented Dec 16, 2020

Uh oh!

patrickvonplaten commented Dec 16, 2020

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jplu commented Dec 15, 2020 •

edited by patrickvonplaten

Loading

jplu commented Dec 15, 2020 •

edited

Loading

jplu commented Dec 16, 2020 •

edited

Loading