-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
System Info
The box_predictor function computes box bias from scratch each time it is called, but box bias does not actually depend on anything besides the shape of the feature map (which depends on the batch size (which can be broadcasted), number of ViT patches, and ViT token dim) and therefore should be precomputed. This is a super simple fix and will result in a >>10x inference speedup. (https://github.com/huggingface/transformers/blob/ce2e7ef3d96afaf592faf3337b7dd997c7ad4928/src/transformers/models/owlvit/modeling_owlvit.py#L1389C35-L1389C35)
Who can help?
@amyeroberts @pasqualedem @alaradirik
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Steps to reproduce the behavior:
- Add some lines of code to box_predictor which time its overall execution + the execution of each individual line
- Run one of the inference examples
Expected behavior
compute_box_bias will take a substantial amount of time (due to the normalize_grid_corner_coordinates call) while the other lines will be negligible