Adding new train_step logic to make things less confusing for users by Rocketknight1 · Pull Request #15994 · huggingface/transformers

Rocketknight1 · 2022-03-08T16:40:08Z

Change to train_step I discussed earlier with @gante, making the new TF approaches less confusing for users.

This PR is extremely experimental right now, and since it touches train_step it could break the entire TF side of the library, so please don't merge it until I'm done!

HuggingFaceDocBuilderDev · 2022-03-08T16:44:28Z

The documentation is not available anymore as the PR was closed or merged.

Rocketknight1 · 2022-03-29T16:29:33Z

This should now be ready for review! The main changes are:

model.fit() with the internal loss is now tested correctly
The internal loss will now work no matter where you pass your labels, which should reduce user confusion by about 50%
Attempting to use Keras metrics with the internal loss will now throw an informative error, instead of just creating a huge mess like it used to

sgugger

Thanks for working on this!

src/transformers/modeling_tf_utils.py

tests/convnext/test_modeling_tf_convnext.py

tests/t5/test_modeling_tf_t5.py

gante

The next TF MLE will have an easier time with the loss handling, hopefully 🙏 LGTM!

Asked a few questions, to better understand what's going on in some parts of the code.

tests/test_modeling_tf_common.py

Rocketknight1 · 2022-03-29T19:30:38Z

I see some failing tests which are possibly caused by an outdated version of TF in the Docker image that doesn't understand steps_per_execution. Will investigate tomorrow!

Rocketknight1 · 2022-03-31T13:12:20Z

Tests are passing and I think we're ready for final review!

I have a couple of things I'm still unsure about, though:

@gante: I removed the expand_1d call in train_step. Keras by default expands 1D input arrays to 2D, e.g. shape (8,) becomes (8, 1). This is not what our models expect at all and the test revealed some failures, that were fixed by removing it. Can you think of anything that change might break, though?

@sgugger / @LysandreJik: Is there any way to get a list of all possible 'label' column names for our models? This probably seems like a strange question, but the reason I need it is that if the user passes a single tensor as the labels to Keras, I need to make that tensor a key in the input dict, which means figuring out which argument name is the 'label'. Right now I just hardcoded a list of common ones, and then I inspect the function signature to see which of those is present, but it would be a lot better if I had some kind of definitive list.

sgugger · 2022-03-31T13:20:41Z

We don't have a definitive list and maintaining it is going to be difficult as well with new modalities arriving. Perhaps we should add an attribute label_names to all models to have this info stored somewhere? The Trainer would also benefit from it on the PyTorch side.

Wdyt @LysandreJik ?

LysandreJik · 2022-03-31T13:26:13Z

I wouldn't mind having the models know their label names, otherwise inspecting the signature and looking for any argument that contains label should give the correct information as well. If doing that during inference/training, it might slow down the forward pass so if you feel like adding a label_names is cleaner that's fine for me.

sgugger · 2022-03-31T13:30:09Z

The problem is that some models have labels without labels in their names (QA models, I'm looking at you!)

Rocketknight1 · 2022-03-31T13:34:03Z

For TF models trained with Keras, the call (forward) method is usually only run once to build the graph, so expensive Python calls like inspect are totally okay!

That compilation step is really the source of a lot of different design decisions between the two frameworks - e.g. TF einsum takes a long time to figure out the optimal contraction during compilation and then saves it, so you can get very different runtimes as a result.

LysandreJik · 2022-03-31T13:36:46Z

The problem is that some models have labels without labels in their names (QA models, I'm looking at you!)

For consistency's sake, shouldn't we rename them to have label in their name? (with appropriate deprecation cycle of course)
Maybe a non-issue if no users have ever been misled, but if it makes everything clearer and helps programmatic handling of labels, then it might be worth it

sgugger · 2022-03-31T13:59:02Z

I don't think making a change on the labels for the QA models is warranted as we can make an exception for those (double check the name "QuestionAnswering" and that it has start_positions and end_positions in its signature for instance). We can do the inspection once at init, so it's not a big deal in terms of performance.

gante · 2022-03-31T15:26:20Z

@gante: I removed the expand_1d call in train_step. Keras by default expands 1D input arrays to 2D, e.g. shape (8,) becomes (8, 1). This is not what our models expect at all and the test revealed some failures, that were fixed by removing it. Can you think of anything that change might break, though?

I can't think of any problem. It is also being tested, so if there are problems, we should be able to catch them quickly.

Rocketknight1 · 2022-03-31T16:09:46Z

For reference, here are all of the arguments for any forward() method in the codebase on a subclass of PreTrainedModel that contain "label" in their name:

{'sentence_order_label', 'mc_labels', 'next_sentence_label', 'obj_labels', 'mask_labels', 'sentence_image_labels', 'class_labels', 'labels', 'entity_labels', 'matched_label'}

@sgugger @LysandreJik None of them look especially wrong to me, so I'm happy to assume this list + {"start_positions", "end_positions"} covers all of the possible labels and just do an inspect call if you think that's a good solution!

sgugger · 2022-03-31T18:05:21Z

Added a utility in #16526

Rocketknight1 · 2022-04-01T16:57:18Z

Quick update: This PR is mostly ready, but I'm going to wait for find_labels to be merged, then rebase and use it here.

…tations!

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Rocketknight1 · 2022-04-05T13:23:18Z

Rebased with find_labels and looks good in testing, so I'm going to merge!

…uggingface#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* 📝 add image/vision classification and asr * 🖍 minor formatting fixes * Fixed a typo in legacy seq2seq_trainer.py (#16531) * Add ONNX export for BeiT (#16498) * Add beit onnx conversion support * Updated docs * Added cross reference to ViT ONNX config * call on_train_end when trial is pruned (#16536) * Type hints added (#16529) * Fix Bart type hints (#16297) * Add type hints to PLBart PyTorch * Remove pending merge conflicts * Fix PLBart Type Hints * Add changes from review * Add VisualBert type hints (#16544) * Adding missing type hints for mBART model (PyTorch) (#16429) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent Co-authored-by: matt <rocketknight1@gmail.com> * Remove MBart subclass of XLMRoberta in tokenzier docs (#16546) * Remove MBart subclass of XLMRoberta in tokenzier * Fix style * Copy docs from MBart50 tokenizer * Use random_attention_mask for TF tests (#16517) * use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Improve code example (#16450) Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> * Pin tokenizers version <0.13 (#16539) * Pin tokenizers version <0.13 * Style * Add code samples for TF speech models (#16494) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581) * [FlaxSpeechEncoderDecoder] Fix dtype bug * more fixes * Making the impossible to connect error actually report the right URL. (#16446) * Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556) * Add utility to find model labels (#16526) * Add utility to find model labels * Use it in the Trainer * Update src/transformers/utils/generic.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Quality Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Enable doc in Spanish (#16518) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates * Add use_auth to load_datasets for private datasets to PT and TF examples (#16521) * fix formatting and remove use_auth * Add use_auth_token to Flax examples * add a test checking the format of `convert_tokens_to_string`'s output (#16540) * add new tests * add comment to overridden tests * TF: Finalize `unpack_inputs`-related changes (#16499) * Add unpack_inputs to remaining models * removed kwargs to `call()` in TF models * fix TF T5 tests * [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586) * initialize the default rank set on TrainerState (#16530) * initialize the default rank set on TrainerState * fix style * Trigger doc build * Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * add a template to add missing tokenization test (#16553) * add a template to add missing tokenization test * add cookiecutter setting * improve doc * Update templates/adding_a_missing_tokenization_test/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * made _load_pretrained_model_low_mem static + bug fix (#16548) * handle torch_dtype in low cpu mem usage (#16580) * [Doctests] Correct filenaming (#16599) * [Doctests] Correct filenaming * improve quicktour * make style * Adding new train_step logic to make things less confusing for users (#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Adding missing type hints for BigBird model (#16555) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent * Type hints for BigBird * removing typos Co-authored-by: matt <rocketknight1@gmail.com> * [deepspeed] fix typo, adjust config name (#16597) * 🖍 apply feedback Co-authored-by: Cathy <815244047@qq.com> Co-authored-by: Jim Rohrer <jrohrer1@gmail.com> Co-authored-by: Ferdinand Schlatt <fschlatt@gmail.com> Co-authored-by: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com> Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com> Co-authored-by: Rishav Chandra Varma <rishavchandra.v16@iiits.in> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Andres Codas <andrescodas@users.noreply.github.com> Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com> Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Rocketknight1 force-pushed the tf_train_step_copies branch from f898295 to f4e2d0a Compare March 16, 2022 16:33

Rocketknight1 force-pushed the tf_train_step_copies branch 4 times, most recently from 00c6a43 to 3b34066 Compare March 29, 2022 16:27

Rocketknight1 requested review from LysandreJik, gante and sgugger March 29, 2022 16:28

sgugger approved these changes Mar 29, 2022

View reviewed changes

gante approved these changes Mar 29, 2022

View reviewed changes

tests/test_modeling_tf_common.py Show resolved Hide resolved

tests/test_modeling_tf_common.py Show resolved Hide resolved

tests/test_modeling_tf_common.py Show resolved Hide resolved

Rocketknight1 force-pushed the tf_train_step_copies branch from 6b12735 to 0bbeac4 Compare March 29, 2022 19:28

Rocketknight1 added 5 commits April 5, 2022 13:14

Adding new train_step logic to make things less confusing for users

fae607a

DO NOT ASK WHY WE NEED THAT SUBCLASS

0250544

Metrics now working, at least for single-output models with type anno…

0e294f5

…tations!

Updates and TODOs for the new train_step

125235e

Make fixup

119cbf2

Rocketknight1 and others added 22 commits April 5, 2022 13:14

Update test_step too

a3556a8

Stop trying to compute metrics with the dummy_loss, patch up test

f5dc36e

Make style

680f6d2

make fixup

8d805d8

Docstring cleanup

8ed327f

make fixup

8226e7b

make fixup

ca6bd5b

Stop expanding 1D input tensors when using dummy loss

b7ae061

Adjust T5 test given the new compile()

b015f5c

make fixup

28d58da

Skipping test for convnext

8666fdb

Removing old T5-specific Keras test now that we have a common one

6bb0621

make fixup

d34d428

make fixup

3c4465a

Only skip convnext test on CPU

bd414d3

Update src/transformers/modeling_tf_utils.py

11a608f

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_tf_utils.py

5e01947

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Avoiding TF import issues

0b5287a

make fixup

196a90b

Update compile() to support TF 2.3

9548480

Skipping model.fit() on template classes for now

c03492c

Skipping model.fit() on template class tests for now

81ad430

Rocketknight1 force-pushed the tf_train_step_copies branch from fefe4c8 to 81ad430 Compare April 5, 2022 12:14

Rocketknight1 added 2 commits April 5, 2022 13:33

Replace ad-hoc solution with find_labels

e961c73

make fixup

f73c90a

Rocketknight1 merged commit 4354005 into main Apr 5, 2022

Rocketknight1 deleted the tf_train_step_copies branch April 5, 2022 13:23

Conversation

Rocketknight1 commented Mar 8, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Mar 29, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rocketknight1 commented Mar 29, 2022

Uh oh!

Rocketknight1 commented Mar 31, 2022

Uh oh!

sgugger commented Mar 31, 2022

Uh oh!

LysandreJik commented Mar 31, 2022

Uh oh!

sgugger commented Mar 31, 2022

Uh oh!

Rocketknight1 commented Mar 31, 2022

Uh oh!

LysandreJik commented Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Mar 31, 2022

Uh oh!

gante commented Mar 31, 2022

Uh oh!

Rocketknight1 commented Mar 31, 2022

Uh oh!

sgugger commented Mar 31, 2022

Uh oh!

Rocketknight1 commented Apr 1, 2022

Uh oh!

Rocketknight1 commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HuggingFaceDocBuilderDev commented Mar 8, 2022 •

edited

Loading

LysandreJik commented Mar 31, 2022 •

edited

Loading