This is the official implementation of High-Resolution Transformer (HRT) for pose estimation. We present a High-Resolution Transformer (HRT) that learns high-resolution repre-sentations for dense prediction tasks, in contrast to the original Vision Transformerthat produces low-resolution representations and has high memory and computa-tional cost. We take advantage of the multi-resolution parallel design introduced inhigh-resolution convolutional networks (HRNet), along with local-window self-attention that performs self-attention over small non-overlapping image windows,for improving the memory and computation efficiency. In addition, we introduce aconvolution into the FFN to exchange information across the disconnected imagewindows. We demonstrate the effectiveness of the High-Resolution Transformeron human pose estimation and semantic segmentation tasks.
| Backbone | Input Size | AP | AP50 | AP75 | ARM | ARL | AR | ckpt | log | script |
|---|---|---|---|---|---|---|---|---|---|---|
| HRT-S | 256x192 | 74.0% | 90.2% | 81.2% | 70.4% | 80.7% | 79.4% | ckpt | log | script |
| HRT-S | 384x288 | 75.6% | 90.3% | 82.2% | 71.6% | 82.5% | 80.7% | ckpt | log | script |
| HRT-B | 256x192 | 75.6% | 90.8% | 82.8% | 71.7% | 82.6% | 80.8% | ckpt | log | script |
| HRT-B | 384x288 | 77.2% | 91.0% | 83.6% | 73.2% | 84.2% | 82.0% | ckpt | log | script |
| Backbone | Input Size | AP | AP50 | AP75 | ARM | ARL | AR | ckpt | log | script |
|---|---|---|---|---|---|---|---|---|---|---|
| HRT-S | 384x288 | 74.5% | 92.3% | 82.1% | 70.7% | 80.6% | 79.8% | ckpt | log | script |
| HRT-B | 384x288 | 76.2% | 92.7% | 83.8% | 72.5% | 82.3% | 81.2% | ckpt | log | script |
The models are first pre-trained on ImageNet-1K dataset, and then fine-tuned on COCO val2017 dataset.
If you find this project useful in your research, please consider cite:
@article{YuanFHZCW21,
title={HRT: High-Resolution Transformer for Dense Prediction},
author={Yuhui Yuan and Rao Fu and Lang Huang and Chao Zhang and Xilin Chen and Jingdong Wang},
booktitle={arXiv},
year={2021}
}
