Skip to content

OWL-ViT box_predictor is wildly inefficient since box bias is not precomputed  #26099

@5hadytru

Description

@5hadytru

System Info

The box_predictor function computes box bias from scratch each time it is called, but box bias does not actually depend on anything besides the shape of the feature map (which depends on the batch size (which can be broadcasted), number of ViT patches, and ViT token dim) and therefore should be precomputed. This is a super simple fix and will result in a >>10x inference speedup. (https://github.com/huggingface/transformers/blob/ce2e7ef3d96afaf592faf3337b7dd997c7ad4928/src/transformers/models/owlvit/modeling_owlvit.py#L1389C35-L1389C35)

Who can help?

@amyeroberts @pasqualedem @alaradirik

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce the behavior:

  1. Add some lines of code to box_predictor which time its overall execution + the execution of each individual line
  2. Run one of the inference examples

Expected behavior

compute_box_bias will take a substantial amount of time (due to the normalize_grid_corner_coordinates call) while the other lines will be negligible

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions