Skip to content

Adding new train_step logic to make things less confusing for users#15994

Merged
Rocketknight1 merged 39 commits intomainfrom
tf_train_step_copies
Apr 5, 2022
Merged

Adding new train_step logic to make things less confusing for users#15994
Rocketknight1 merged 39 commits intomainfrom
tf_train_step_copies

Conversation

@Rocketknight1
Copy link
Member

Change to train_step I discussed earlier with @gante, making the new TF approaches less confusing for users.

This PR is extremely experimental right now, and since it touches train_step it could break the entire TF side of the library, so please don't merge it until I'm done!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 8, 2022

The documentation is not available anymore as the PR was closed or merged.

@Rocketknight1 Rocketknight1 force-pushed the tf_train_step_copies branch 4 times, most recently from 00c6a43 to 3b34066 Compare March 29, 2022 16:27
@Rocketknight1
Copy link
Member Author

This should now be ready for review! The main changes are:

  1. model.fit() with the internal loss is now tested correctly
  2. The internal loss will now work no matter where you pass your labels, which should reduce user confusion by about 50%
  3. Attempting to use Keras metrics with the internal loss will now throw an informative error, instead of just creating a huge mess like it used to

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next TF MLE will have an easier time with the loss handling, hopefully 🙏 LGTM!

Asked a few questions, to better understand what's going on in some parts of the code.

@Rocketknight1
Copy link
Member Author

I see some failing tests which are possibly caused by an outdated version of TF in the Docker image that doesn't understand steps_per_execution. Will investigate tomorrow!

@Rocketknight1
Copy link
Member Author

Tests are passing and I think we're ready for final review!

I have a couple of things I'm still unsure about, though:

@gante: I removed the expand_1d call in train_step. Keras by default expands 1D input arrays to 2D, e.g. shape (8,) becomes (8, 1). This is not what our models expect at all and the test revealed some failures, that were fixed by removing it. Can you think of anything that change might break, though?

@sgugger / @LysandreJik: Is there any way to get a list of all possible 'label' column names for our models? This probably seems like a strange question, but the reason I need it is that if the user passes a single tensor as the labels to Keras, I need to make that tensor a key in the input dict, which means figuring out which argument name is the 'label'. Right now I just hardcoded a list of common ones, and then I inspect the function signature to see which of those is present, but it would be a lot better if I had some kind of definitive list.

@sgugger
Copy link
Collaborator

sgugger commented Mar 31, 2022

We don't have a definitive list and maintaining it is going to be difficult as well with new modalities arriving. Perhaps we should add an attribute label_names to all models to have this info stored somewhere? The Trainer would also benefit from it on the PyTorch side.

Wdyt @LysandreJik ?

@LysandreJik
Copy link
Member

I wouldn't mind having the models know their label names, otherwise inspecting the signature and looking for any argument that contains label should give the correct information as well. If doing that during inference/training, it might slow down the forward pass so if you feel like adding a label_names is cleaner that's fine for me.

@sgugger
Copy link
Collaborator

sgugger commented Mar 31, 2022

The problem is that some models have labels without labels in their names (QA models, I'm looking at you!)

@Rocketknight1
Copy link
Member Author

For TF models trained with Keras, the call (forward) method is usually only run once to build the graph, so expensive Python calls like inspect are totally okay!

That compilation step is really the source of a lot of different design decisions between the two frameworks - e.g. TF einsum takes a long time to figure out the optimal contraction during compilation and then saves it, so you can get very different runtimes as a result.

@LysandreJik
Copy link
Member

LysandreJik commented Mar 31, 2022

The problem is that some models have labels without labels in their names (QA models, I'm looking at you!)

For consistency's sake, shouldn't we rename them to have label in their name? (with appropriate deprecation cycle of course)
Maybe a non-issue if no users have ever been misled, but if it makes everything clearer and helps programmatic handling of labels, then it might be worth it

@sgugger
Copy link
Collaborator

sgugger commented Mar 31, 2022

I don't think making a change on the labels for the QA models is warranted as we can make an exception for those (double check the name "QuestionAnswering" and that it has start_positions and end_positions in its signature for instance). We can do the inspection once at init, so it's not a big deal in terms of performance.

@gante
Copy link
Contributor

gante commented Mar 31, 2022

@gante: I removed the expand_1d call in train_step. Keras by default expands 1D input arrays to 2D, e.g. shape (8,) becomes (8, 1). This is not what our models expect at all and the test revealed some failures, that were fixed by removing it. Can you think of anything that change might break, though?

I can't think of any problem. It is also being tested, so if there are problems, we should be able to catch them quickly.

@Rocketknight1
Copy link
Member Author

For reference, here are all of the arguments for any forward() method in the codebase on a subclass of PreTrainedModel that contain "label" in their name:

{'sentence_order_label', 'mc_labels', 'next_sentence_label', 'obj_labels', 'mask_labels', 'sentence_image_labels', 'class_labels', 'labels', 'entity_labels', 'matched_label'}

@sgugger @LysandreJik None of them look especially wrong to me, so I'm happy to assume this list + {"start_positions", "end_positions"} covers all of the possible labels and just do an inspect call if you think that's a good solution!

@sgugger
Copy link
Collaborator

sgugger commented Mar 31, 2022

Added a utility in #16526

@Rocketknight1
Copy link
Member Author

Quick update: This PR is mostly ready, but I'm going to wait for find_labels to be merged, then rebase and use it here.

@Rocketknight1 Rocketknight1 force-pushed the tf_train_step_copies branch from fefe4c8 to 81ad430 Compare April 5, 2022 12:14
@Rocketknight1
Copy link
Member Author

Rebased with find_labels and looks good in testing, so I'm going to merge!

@Rocketknight1 Rocketknight1 merged commit 4354005 into main Apr 5, 2022
@Rocketknight1 Rocketknight1 deleted the tf_train_step_copies branch April 5, 2022 13:23
stevhliu pushed a commit to stevhliu/transformers that referenced this pull request Apr 5, 2022
…uggingface#15994)

* Adding new train_step logic to make things less confusing for users

* DO NOT ASK WHY WE NEED THAT SUBCLASS

* Metrics now working, at least for single-output models with type annotations!

* Updates and TODOs for the new train_step

* Make fixup

* Temporary test workaround until T5 has types

* Temporary test workaround until T5 has types

* I think this actually works! Needs a lot of tests though

* MAke style/quality

* Revert changes to T5 tests

* Deleting the aforementioned unmentionable subclass

* Deleting the aforementioned unmentionable subclass

* Adding a Keras API test

* Style fixes

* Removing unneeded TODO and comments

* Update test_step too

* Stop trying to compute metrics with the dummy_loss, patch up test

* Make style

* make fixup

* Docstring cleanup

* make fixup

* make fixup

* Stop expanding 1D input tensors when using dummy loss

* Adjust T5 test given the new compile()

* make fixup

* Skipping test for convnext

* Removing old T5-specific Keras test now that we have a common one

* make fixup

* make fixup

* Only skip convnext test on CPU

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Avoiding TF import issues

* make fixup

* Update compile() to support TF 2.3

* Skipping model.fit() on template classes for now

* Skipping model.fit() on template class tests for now

* Replace ad-hoc solution with find_labels

* make fixup

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
stevhliu added a commit that referenced this pull request Apr 5, 2022
* 📝 add image/vision classification and asr

* 🖍 minor formatting fixes

* Fixed a typo in legacy seq2seq_trainer.py (#16531)

* Add ONNX export for BeiT (#16498)

* Add beit onnx conversion support

* Updated docs

* Added cross reference to ViT ONNX config

* call on_train_end when trial is pruned (#16536)

* Type hints added (#16529)

* Fix Bart type hints (#16297)

* Add type hints to PLBart PyTorch

* Remove pending merge conflicts

* Fix PLBart Type Hints

* Add changes from review

* Add VisualBert type hints (#16544)

* Adding missing type hints for mBART model (PyTorch) (#16429)

* added type hints for mbart tensorflow tf implementation

* Adding missing type hints for mBART model 

Tensorflow Implementation model added with missing type hints

* Missing Type hints - correction

For TF model

* Code fixup using make quality tests

* Hint types - typo error

* make fix-copies and make fixup

* type hints

* updated files

* type hints update

* making dependent modesls coherent

Co-authored-by: matt <rocketknight1@gmail.com>

* Remove MBart subclass of XLMRoberta in tokenzier docs (#16546)

* Remove MBart subclass of XLMRoberta in tokenzier

* Fix style

* Copy docs from MBart50 tokenizer

* Use random_attention_mask for TF tests (#16517)

* use random_attention_mask for TF tests

* Fix for TFCLIP test (for now).

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Improve code example (#16450)

Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>

* Pin tokenizers version <0.13 (#16539)

* Pin tokenizers version <0.13

* Style

* Add code samples for TF speech models (#16494)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581)

* [FlaxSpeechEncoderDecoder] Fix dtype bug

* more fixes

* Making the impossible to connect error actually report the right URL. (#16446)

* Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556)

* Add utility to find model labels (#16526)

* Add utility to find model labels

* Use it in the Trainer

* Update src/transformers/utils/generic.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Quality

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Enable doc in Spanish (#16518)

* Reorganize doc for multilingual support

* Fix style

* Style

* Toc trees

* Adapt templates

* Add use_auth to load_datasets for private datasets to PT and TF examples (#16521)

* fix formatting and remove use_auth

* Add use_auth_token to Flax examples

* add a test checking the format of `convert_tokens_to_string`'s output (#16540)

* add new tests

* add comment to overridden tests

* TF: Finalize `unpack_inputs`-related changes (#16499)

* Add unpack_inputs to remaining models

* removed kwargs to `call()` in TF models

* fix TF T5 tests

* [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586)

* initialize the default rank set on TrainerState (#16530)

* initialize the default rank set on TrainerState

* fix style

* Trigger doc build

* Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* add a template to add missing tokenization test (#16553)

* add a template to add missing tokenization test

* add cookiecutter setting

* improve doc

* Update templates/adding_a_missing_tokenization_test/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* made _load_pretrained_model_low_mem static + bug fix (#16548)

* handle torch_dtype in low cpu mem usage (#16580)

* [Doctests] Correct filenaming (#16599)

* [Doctests] Correct filenaming

* improve quicktour

* make style

* Adding new train_step logic to make things less confusing for users (#15994)

* Adding new train_step logic to make things less confusing for users

* DO NOT ASK WHY WE NEED THAT SUBCLASS

* Metrics now working, at least for single-output models with type annotations!

* Updates and TODOs for the new train_step

* Make fixup

* Temporary test workaround until T5 has types

* Temporary test workaround until T5 has types

* I think this actually works! Needs a lot of tests though

* MAke style/quality

* Revert changes to T5 tests

* Deleting the aforementioned unmentionable subclass

* Deleting the aforementioned unmentionable subclass

* Adding a Keras API test

* Style fixes

* Removing unneeded TODO and comments

* Update test_step too

* Stop trying to compute metrics with the dummy_loss, patch up test

* Make style

* make fixup

* Docstring cleanup

* make fixup

* make fixup

* Stop expanding 1D input tensors when using dummy loss

* Adjust T5 test given the new compile()

* make fixup

* Skipping test for convnext

* Removing old T5-specific Keras test now that we have a common one

* make fixup

* make fixup

* Only skip convnext test on CPU

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Avoiding TF import issues

* make fixup

* Update compile() to support TF 2.3

* Skipping model.fit() on template classes for now

* Skipping model.fit() on template class tests for now

* Replace ad-hoc solution with find_labels

* make fixup

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding missing type hints for BigBird model   (#16555)

* added type hints for mbart tensorflow tf implementation

* Adding missing type hints for mBART model 

Tensorflow Implementation model added with missing type hints

* Missing Type hints - correction

For TF model

* Code fixup using make quality tests

* Hint types - typo error

* make fix-copies and make fixup

* type hints

* updated files

* type hints update

* making dependent modesls coherent

* Type hints for BigBird

* removing typos

Co-authored-by: matt <rocketknight1@gmail.com>

* [deepspeed] fix typo, adjust config name (#16597)

* 🖍 apply feedback

Co-authored-by: Cathy <815244047@qq.com>
Co-authored-by: Jim Rohrer <jrohrer1@gmail.com>
Co-authored-by: Ferdinand Schlatt <fschlatt@gmail.com>
Co-authored-by: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com>
Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com>
Co-authored-by: Rishav Chandra Varma <rishavchandra.v16@iiits.in>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Andres Codas <andrescodas@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants