Skip to content

mehek-niwas/PEEK-Var

Repository files navigation

PEEK-Var

exploring the application of pruning methods via PEEK variance to vision transformers

based on prior research from NETS lab

some preliminary results:

the heat map represents the value of the average attention entropy for a specific patch in a specific encoder layer. this means that if the value is higher, then that specific image patch is paying attention to many other patches in the image. if the value is lower, then that specific image patch is paying attention to very specific patches in the image.

another observation is that the attention entropy value decreases as the depth in the neural network increases.

image

i have also created histograms of the attention entropy distribution for each layer in a small encoder vision transformer. i have noticed that the distribution tends to be come bimodal in the last (deepest) layer.

image

things to investigate (in no particular order):

  • analyzing results using a larger pretrained model
  • is the bimodal distribution consistent in the last layer?
  • can we learn anything by applying this to language instead?
  • what if we analyze the attention entropy distribution of individual heads within layers?

About

applying pruning methods via PEEK variance to vision transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages