OWL-ViT box_predictor is wildly inefficient since box bias is not precomputed 

### System Info

The box_predictor function computes box bias from scratch each time it is called, but box bias does not actually depend on anything besides the shape of the feature map (which depends on the batch size (which can be broadcasted), number of ViT patches, and ViT token dim) and therefore should be precomputed. This is a super simple fix and will result in a >>10x inference speedup. (https://github.com/huggingface/transformers/blob/ce2e7ef3d96afaf592faf3337b7dd997c7ad4928/src/transformers/models/owlvit/modeling_owlvit.py#L1389C35-L1389C35)

### Who can help?

@amyeroberts @pasqualedem @alaradirik 

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Steps to reproduce the behavior:

1. Add some lines of code to box_predictor which time its overall execution + the execution of each individual line
2. Run one of the inference examples

### Expected behavior

compute_box_bias will take a substantial amount of time (due to the normalize_grid_corner_coordinates call) while the other lines will be negligible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OWL-ViT box_predictor is wildly inefficient since box bias is not precomputed #26099

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OWL-ViT box_predictor is wildly inefficient since box bias is not precomputed #26099

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions