Skip to content

Update transformers metadata#14724

Merged
sgugger merged 4 commits intomasterfrom
update_transformers_metadata
Dec 13, 2021
Merged

Update transformers metadata#14724
sgugger merged 4 commits intomasterfrom
update_transformers_metadata

Conversation

@sgugger
Copy link
Collaborator

@sgugger sgugger commented Dec 10, 2021

What does this PR do?

This PR adds a job that will auto-update the repository transformers-metadata each time it's necessary.


- name: Update metadata
run: |
python utils/update_metadata.py --token ${{ secrets.SYLVAIN_HF_TOKEN }} --commit_sha ${{ github.sha }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a token created just for this ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @SBrandeis :)



# Fill this with tuples (pipeline_tag, model_mapping, auto_model)
PIPELINE_TAGS_AND_AUTO_MODELS = [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list will need to be updated when we create new pipelines / auto-classes. I will add a script in repo consistency to make sure it's kept up to date (later today or on Monday).

table = {tags_dataset[i]["model_class"]: (tags_dataset[i]["pipeline_tag"], tags_dataset[i]["auto_class"]) for i in range(len(tags_dataset))}
table = update_pipeline_and_auto_class_table(table)

# Sort the model classes to avoid some nondeterministic updates to create false update commits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +77 to +81
def get_frameworks_table():
"""
Generates a dataframe containing the supported auto classes for each model type, using the content of the auto
modules.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlike for model classes, do we guarantee that we never have model_types that disappear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A model_type disappearing would be hugely breaking, so should only happen in very extreme cases. I can't think of a situation where we have done it, or where we would do it to an already merged PR.

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's try to monitor the first few updates!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, clean usage of Repository too :)

Comment on lines +77 to +81
def get_frameworks_table():
"""
Generates a dataframe containing the supported auto classes for each model type, using the content of the auto
modules.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A model_type disappearing would be hugely breaking, so should only happen in very extreme cases. I can't think of a situation where we have done it, or where we would do it to an already merged PR.

Comment on lines +122 to +134
processors = {}
for t in all_models:
if t in transformers_module.models.auto.processing_auto.PROCESSOR_MAPPING_NAMES:
processors[t] = "AutoProcessor"
elif t in transformers_module.models.auto.tokenization_auto.TOKENIZER_MAPPING_NAMES:
processors[t] = "AutoTokenizer"
elif t in transformers_module.models.auto.feature_extraction_auto.FEATURE_EXTRACTOR_MAPPING_NAMES:
processors[t] = "AutoFeatureExtractor"
else:
# Default to AutoTokenizer if a model has nothing, for backward compatibility.
processors[t] = "AutoTokenizer"

data["processor"] = [processors[t] for t in all_models]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice approach

@sgugger sgugger merged commit 64e92ed into master Dec 13, 2021
@sgugger sgugger deleted the update_transformers_metadata branch December 13, 2021 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants