[Flax] Adapt Flax models to new structure by patrickvonplaten · Pull Request #9484 · huggingface/transformers

patrickvonplaten · 2021-01-08T16:13:27Z

What does this PR do?

As discussed in #9172, Flax model should get a design that is most similar to PyTorch and thus should use def setup(...) instead of nn.compact(...). This PR refactors the model architecture of Bert & Roberta accordingly.

The next step is now to add a general conversion method flax<>pytorch which might require some more follow-up changes to the naming of the weights.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

…ctra model

…n model output, ranging 0.0010 - 0.0016

…x_pr_just_in_case

sgugger

Thanks a lot for cleaning this up! I like the new style!
I left a lot of nits, mostly around naming and style.

src/transformers/models/bert/modeling_flax_bert.py

src/transformers/models/roberta/modeling_flax_roberta.py

LysandreJik

I like that it's defined through setup and through __call__ instead of just through __call__ with nn.compact! It makes it clearer, imo.

Great job, I think it's much more readable now than it was before!

src/transformers/models/bert/modeling_flax_bert.py

tests/test_modeling_flax_common.py

patrickvonplaten · 2021-03-17T21:25:01Z

Will wait until #10775 is merged, then rebase and then merge.

…into save_intermediate_flax_pr_just_in_case

merrymercy · 2021-04-13T13:18:34Z

@patrickvonplaten

I like the new structure but it seems this PR broke the flax example: https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm_flax.py

This line (

transformers/examples/language-modeling/run_mlm_flax.py

Line 577 in 896d7be

dropout_rate=0.1,

) will raise the error

TypeError: __init__() got an unexpected keyword argument 'dropout_rate'

In addition, this line

transformers/src/transformers/models/bert/modeling_flax_bert.py

Line 254 in 896d7be

if not deterministic and self.dropout_rate > 0.0:

uses an undefined variable self.dropout_rate.

I think we should make more test cases and make sure the examples are runnable.

merrymercy · 2021-04-13T13:24:13Z

I am very interested in the jax/flax integration. Could you also take a look at my PR? #10796
If you are collaborative and welcome contributions from me, I can contribute more and improve the flax examples.

* Create modeling_flax_eletra with code copied from modeling_flax_bert * Add ElectraForMaskedLM and ElectraForPretraining * Add modeling test for Flax electra and fix naming and arg in Flax Electra model * Add documentation * Fix code style * Create modeling_flax_eletra with code copied from modeling_flax_bert * Add ElectraForMaskedLM and ElectraForPretraining * Add modeling test for Flax electra and fix naming and arg in Flax Electra model * Add documentation * Fix code style * Fix code quality * Adjust tol in assert_almost_equal due to very small difference between model output, ranging 0.0010 - 0.0016 * Remove redundant ElectraPooler * save intermediate * adapt * correct bert flax design * adapt roberta as well * finish roberta flax * finish * apply suggestions * apply suggestions Co-authored-by: Chris Nguyen <anhtu2687@gmail.com>

chris-tng and others added 15 commits December 16, 2020 21:51

Create modeling_flax_eletra with code copied from modeling_flax_bert

562b14c

Add ElectraForMaskedLM and ElectraForPretraining

2817788

Add modeling test for Flax electra and fix naming and arg in Flax Ele…

21cba98

…ctra model

Add documentation

2e014c7

Fix code style

86e6ae9

Create modeling_flax_eletra with code copied from modeling_flax_bert

c00fc2f

Add ElectraForMaskedLM and ElectraForPretraining

9fb01dd

Add modeling test for Flax electra and fix naming and arg in Flax Ele…

c6ee070

…ctra model

Add documentation

e034b27

Fix code style

82a8daf

Fix code quality

e811f50

Merge

de83cc4

Adjust tol in assert_almost_equal due to very small difference betwee…

1bf3681

…n model output, ranging 0.0010 - 0.0016

Remove redundant ElectraPooler

a880f25

save intermediate

385f917

patrickvonplaten marked this pull request as draft January 8, 2021 16:13

patrickvonplaten mentioned this pull request Jan 8, 2021

[Flax] Implement FlaxElectraModel, FlaxElectraForMaskedLM, FlaxElectraForPreTraining #9172

Closed

5 tasks

patrickvonplaten added 2 commits March 15, 2021 22:24

adapt

6dde548

Merge remote-tracking branch 'main/master' into save_intermediate_fla…

b117255

…x_pr_just_in_case

patrickvonplaten marked this pull request as ready for review March 16, 2021 18:10

correct bert flax design

2f3fe24

patrickvonplaten changed the title ~~[WIP] Save intermediate flax pr just in case~~ [WIP] Adapt Flax models to new structure Mar 16, 2021

adapt roberta as well

67277f7

patrickvonplaten changed the title ~~[WIP] Adapt Flax models to new structure~~ [Flax] Adapt Flax models to new structure Mar 16, 2021

patrickvonplaten added 2 commits March 16, 2021 22:34

finish roberta flax

85a788d

finish

66f8ed5

patrickvonplaten requested review from LysandreJik and sgugger March 16, 2021 20:33

sgugger approved these changes Mar 17, 2021

View reviewed changes

LysandreJik approved these changes Mar 17, 2021

View reviewed changes

patrickvonplaten added 2 commits March 18, 2021 00:22

apply suggestions

eaacd92

apply suggestions

a0745a4

Merge branch 'master' of https://github.com/huggingface/transformers …

6818c48

…into save_intermediate_flax_pr_just_in_case

patrickvonplaten merged commit 0b98ca3 into master Mar 18, 2021

patrickvonplaten deleted the save_intermediate_flax_pr_just_in_case branch March 18, 2021 06:44

patrickvonplaten added the Flax label Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flax] Adapt Flax models to new structure#9484

[Flax] Adapt Flax models to new structure#9484
patrickvonplaten merged 24 commits intomasterfrom
save_intermediate_flax_pr_just_in_case

patrickvonplaten commented Jan 8, 2021 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten commented Mar 17, 2021

Uh oh!

merrymercy commented Apr 13, 2021 •

edited

Loading

Uh oh!

merrymercy commented Apr 13, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

patrickvonplaten commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten commented Mar 17, 2021

Uh oh!

merrymercy commented Apr 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merrymercy commented Apr 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

patrickvonplaten commented Jan 8, 2021 •

edited

Loading

merrymercy commented Apr 13, 2021 •

edited

Loading

merrymercy commented Apr 13, 2021 •

edited

Loading