This repo contains the supported code and configuration files to reproduce object detection results of LIT. It is based on mmdetection.
-
Make sure you have created your environment with our provide scripts. We recommend you create a new environment for experiments with object detection.
# Suppose you already have an env for training LIT on ImageNet. conda create -n lit-det --clone lit -
Next, please refer to get_started.md for mmdetection installation.
-
Prepare COCO dataset.
# Within this directory, do ln -s [path/to/coco] data/ -
Download our pretrained weights on ImageNet and move the weights under
pretrained/.
# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm
# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segmFor example, to evaluate mask-rcnn model with a lit-ti backbone, run:
tools/dist_test.sh configs/lit/mask_rcnn_lit_ti_fpn_1x_coco.py 1 [path/to/checkpoint] --eval bbox segmTo train a detector with pre-trained models, run:
# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] For example, to train a Mask R-CNN model with a lit-ti backbone and 8 gpus, run:
tools/dist_train.sh configs/lit/mask_rcnn_lit_ti_fpn_1x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL> Note: use_checkpoint is used for RetinaNet with LIT-S to save GPU memory. Please refer to this page for more details.
| Backbone | Params (M) | Lr schd | box mAP | Config | Model | Log |
|---|---|---|---|---|---|---|
| LIT-Ti | 30 | 1x | 41.6 | config | github | log |
| LIT-S | 39 | 1x | 41.6 | config | github | log |
| Backbone | Params (M) | Lr schd | box mAP | mask mAP | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| LIT-Ti | 40 | 1x | 42.0 | 39.1 | config | github | log |
| LIT-S | 48 | 1x | 42.9 | 39.6 | config | github | log |
If you use this code for a paper please cite:
@article{pan2021less,
title={Less is More: Pay Less Attention in Vision Transformers},
author={Pan, Zizheng and Zhuang, Bohan and He, Haoyu and Liu, Jing and Cai, Jianfei},
journal={arXiv preprint arXiv:2105.14217},
year={2021}
}