Update transformers metadata by sgugger · Pull Request #14724 · huggingface/transformers

sgugger · 2021-12-10T23:24:50Z

What does this PR do?

This PR adds a job that will auto-update the repository transformers-metadata each time it's necessary.

sgugger · 2021-12-10T23:25:14Z

.github/workflows/update_metdata.yml

+
+      - name: Update metadata
+        run: |
+          python utils/update_metadata.py --token ${{ secrets.SYLVAIN_HF_TOKEN }} --commit_sha ${{ github.sha }}


This is a token created just for this ;-)

cc @SBrandeis :)

sgugger · 2021-12-10T23:26:49Z

utils/update_metadata.py

+
+
+# Fill this with tuples (pipeline_tag, model_mapping, auto_model)
+PIPELINE_TAGS_AND_AUTO_MODELS = [


This list will need to be updated when we create new pipelines / auto-classes. I will add a script in repo consistency to make sure it's kept up to date (later today or on Monday).

julien-c · 2021-12-13T08:32:44Z

utils/update_metadata.py

+        table = {tags_dataset[i]["model_class"]: (tags_dataset[i]["pipeline_tag"], tags_dataset[i]["auto_class"]) for i in range(len(tags_dataset))}
+        table = update_pipeline_and_auto_class_table(table)
+
+        # Sort the model classes to avoid some nondeterministic updates to create false update commits.


julien-c · 2021-12-13T08:35:09Z

utils/update_metadata.py

+def get_frameworks_table():
+    """
+    Generates a dataframe containing the supported auto classes for each model type, using the content of the auto
+    modules.
+    """


unlike for model classes, do we guarantee that we never have model_types that disappear?

A model_type disappearing would be hugely breaking, so should only happen in very extreme cases. I can't think of a situation where we have done it, or where we would do it to an already merged PR.

julien-c

LGTM, let's try to monitor the first few updates!

LysandreJik

Nice, clean usage of Repository too :)

LysandreJik · 2021-12-13T10:14:39Z

utils/update_metadata.py

+def get_frameworks_table():
+    """
+    Generates a dataframe containing the supported auto classes for each model type, using the content of the auto
+    modules.
+    """


A model_type disappearing would be hugely breaking, so should only happen in very extreme cases. I can't think of a situation where we have done it, or where we would do it to an already merged PR.

LysandreJik · 2021-12-13T10:21:17Z

utils/update_metadata.py

+    processors = {}
+    for t in all_models:
+        if t in transformers_module.models.auto.processing_auto.PROCESSOR_MAPPING_NAMES:
+            processors[t] = "AutoProcessor"
+        elif t in transformers_module.models.auto.tokenization_auto.TOKENIZER_MAPPING_NAMES:
+            processors[t] = "AutoTokenizer"
+        elif t in transformers_module.models.auto.feature_extraction_auto.FEATURE_EXTRACTOR_MAPPING_NAMES:
+            processors[t] = "AutoFeatureExtractor"
+        else:
+            # Default to AutoTokenizer if a model has nothing, for backward compatibility.
+            processors[t] = "AutoTokenizer"
+
+    data["processor"] = [processors[t] for t in all_models]


This is a nice approach

sgugger added 3 commits December 10, 2021 15:26

Wip on metadata update

ca9c86d

Most of the script

347e4d8

Add a job to auto-update the transformers metadata

db9b163

sgugger requested review from LysandreJik and julien-c December 10, 2021 23:24

sgugger commented Dec 10, 2021

View reviewed changes

julien-c reviewed Dec 13, 2021

View reviewed changes

julien-c approved these changes Dec 13, 2021

View reviewed changes

LysandreJik approved these changes Dec 13, 2021

View reviewed changes

Style

438a034

sgugger merged commit 64e92ed into master Dec 13, 2021

sgugger deleted the update_transformers_metadata branch December 13, 2021 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update transformers metadata#14724

Update transformers metadata#14724
sgugger merged 4 commits intomasterfrom
update_transformers_metadata

sgugger commented Dec 10, 2021

Uh oh!

sgugger Dec 10, 2021

Uh oh!

julien-c Dec 13, 2021

Uh oh!

sgugger Dec 10, 2021

Uh oh!

julien-c Dec 13, 2021

Uh oh!

julien-c Dec 13, 2021

Uh oh!

LysandreJik Dec 13, 2021

Uh oh!

julien-c left a comment

Uh oh!

LysandreJik left a comment

Uh oh!

LysandreJik Dec 13, 2021

Uh oh!

LysandreJik Dec 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		# Fill this with tuples (pipeline_tag, model_mapping, auto_model)
		PIPELINE_TAGS_AND_AUTO_MODELS = [

Conversation

sgugger commented Dec 10, 2021

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants