Skip to content

Latest commit

 

History

History

README.md

Less is More: Pay Less Attention in Vision Transformers

Training and evaluation code for LIT-S, LIT-M and LIT-B.

Training

First, activate your python environment

conda activate lit

Make sure you have the correct ImageNet DATA_PATH in config/*.yaml.

To train LIT-S:

bash scripts/lit-s.sh [GPUs] 

To train LIT-M:

bash scripts/lit-m.sh [GPUs] 

To train LIT-B:

bash scripts/lit-b.sh [GPUs] 

Note: We use a total batch size of 1024 for all experiments on ImageNet. Therefore, you may want to use a different batch size by editing BATCH_SIZE in configs/*.yaml. For example, by setting BATCH_SIZE to 64 and training with 8 GPUs, your total batch size is 512.

Evaluation

We provide scripts to evaluate LIT-S, LIT-M and LIT-B. To evaluate a model, you can run

bash scripts/lit-b-eval.sh [GPUs] [path/to/checkpoint]

For example, to evaluate LIT-B with 1 GPU, you can run:

bash scripts/lit-b-eval.sh 1 checkpoint/lit_b.pth

This should give

* Acc@1 83.366 Acc@5 96.254
Accuracy of the network on the 50000 test images: 83.4%

Result could be slightly different based on you environment.

Results

Name Params (M) FLOPs (G) Top-1 Acc. (%) Model Log
LIT-S 27 4.1 81.5 google drive/github log
LIT-M 48 8.6 83.0 google drive/github log
LIT-B 86 15.0 83.4 google drive/github log