Conversation
|
|
||
| - name: Update metadata | ||
| run: | | ||
| python utils/update_metadata.py --token ${{ secrets.SYLVAIN_HF_TOKEN }} --commit_sha ${{ github.sha }} |
There was a problem hiding this comment.
This is a token created just for this ;-)
|
|
||
|
|
||
| # Fill this with tuples (pipeline_tag, model_mapping, auto_model) | ||
| PIPELINE_TAGS_AND_AUTO_MODELS = [ |
There was a problem hiding this comment.
This list will need to be updated when we create new pipelines / auto-classes. I will add a script in repo consistency to make sure it's kept up to date (later today or on Monday).
| table = {tags_dataset[i]["model_class"]: (tags_dataset[i]["pipeline_tag"], tags_dataset[i]["auto_class"]) for i in range(len(tags_dataset))} | ||
| table = update_pipeline_and_auto_class_table(table) | ||
|
|
||
| # Sort the model classes to avoid some nondeterministic updates to create false update commits. |
| def get_frameworks_table(): | ||
| """ | ||
| Generates a dataframe containing the supported auto classes for each model type, using the content of the auto | ||
| modules. | ||
| """ |
There was a problem hiding this comment.
unlike for model classes, do we guarantee that we never have model_types that disappear?
There was a problem hiding this comment.
A model_type disappearing would be hugely breaking, so should only happen in very extreme cases. I can't think of a situation where we have done it, or where we would do it to an already merged PR.
julien-c
left a comment
There was a problem hiding this comment.
LGTM, let's try to monitor the first few updates!
LysandreJik
left a comment
There was a problem hiding this comment.
Nice, clean usage of Repository too :)
| def get_frameworks_table(): | ||
| """ | ||
| Generates a dataframe containing the supported auto classes for each model type, using the content of the auto | ||
| modules. | ||
| """ |
There was a problem hiding this comment.
A model_type disappearing would be hugely breaking, so should only happen in very extreme cases. I can't think of a situation where we have done it, or where we would do it to an already merged PR.
| processors = {} | ||
| for t in all_models: | ||
| if t in transformers_module.models.auto.processing_auto.PROCESSOR_MAPPING_NAMES: | ||
| processors[t] = "AutoProcessor" | ||
| elif t in transformers_module.models.auto.tokenization_auto.TOKENIZER_MAPPING_NAMES: | ||
| processors[t] = "AutoTokenizer" | ||
| elif t in transformers_module.models.auto.feature_extraction_auto.FEATURE_EXTRACTOR_MAPPING_NAMES: | ||
| processors[t] = "AutoFeatureExtractor" | ||
| else: | ||
| # Default to AutoTokenizer if a model has nothing, for backward compatibility. | ||
| processors[t] = "AutoTokenizer" | ||
|
|
||
| data["processor"] = [processors[t] for t in all_models] |
What does this PR do?
This PR adds a job that will auto-update the repository transformers-metadata each time it's necessary.