Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai and Chun Yuan
This repository is the official PyTorch implementation of the paper LiVT in CVPR 2023.
python == 3.7
pytorch >= 1.7.0
torchvision >= 0.8.1
timm == 0.3.2
tensorboardX >= 2.1- We recommand to install
PyTorch 1.7.0+,torchvision 0.8.1+andpytorch-image-models 0.3.2. - If your PyTorch is 1.8.1+, a fix is needed to work with timm.
- See
requirements.txtfor detailed requirements. You don't have to be in strict agreement with it, just for reference.
We adopt torchvision.datasets.ImageFolder to build our dataloaders. Hence, we resort all datasets (ImageNet-LT, iNat18, Places-LT, CIFAR) as follows:
/path/to/ImageNet-LT/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpegYou can follow the prepare.py to construct your dataset.
The detailed information of these datasets are shown as follows:
-
Please set the DATA_PATH and WORK_PATH in
util.trainer.pyLine 6-7. -
Typically, make sure 4 or 8 GPUs and >12GB per GPU Memory are available.
-
Keep the settings consistent with the follows.
You can see all args in Class Trainer in util/trainer.py.
Specially, for different stage, the commands are:
# MGP stage
python script/pretrain.py
# BFT stage
python script/finetune.py
# evaluate stage
python script/evaluate.pyBalanced Finetuned Models and Masked Generative Pretrained Models.
| Dataset | Resolution | Many | Med. | Few | Acc | args | log | ckpt | MGP ckpt |
|---|---|---|---|---|---|---|---|---|---|
| ImageNet-LT | 224*224 | 73.6 | 56.4 | 41.0 | 60.9 | download | download | download | Res_224 |
| ImageNet-LT | 384*384 | 76.4 | 59.7 | 42.7 | 63.8 | download | download | download | |
| iNat18 | 224*224 | 78.9 | 76.5 | 74.8 | 76.1 | download | download | download | Res_128 |
| iNat18 | 384*384 | 83.2 | 81.5 | 79.7 | 81.0 | download | download | download |
If you find our idea or code inspiring, please cite our paper:
@inproceedings{LiVT,
title={Learning Imbalanced Data with Vision Transformers},
author={Xu, Zhengzhuo and Liu, Ruikang and Yang, Shuo and Chai, Zenghao and Yuan, Chun},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023}
}This code is partially based on Prior-LT, if you use our code, please also cite:
@inproceedings{PriorLT,
title={Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective},
author={Xu, Zhengzhuo and Chai, Zenghao and Yuan, Chun},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021}
}This project is highly based on DeiT and MAE.
The CIFAR code is based on LDAM and Prior-LT.
The loss implementations are based on CB, LDAM, LADE, PriorLT and MiSLAS.



