PolaFormer++: Polarity-aware Linear Attention with Channel-wise Spikiness

**Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Zheng Zhang^*, Heng Tao Shen

🚀 Welcome to the repo of PolaFormer++!

This repo contains the official PyTorch code for PolaFormer++**, the next version of PolaFormer[1].

Introduction

Motivation

Although Transformers have achieved remarkable success in vision tasks, the quadratic complexity of their self-attention mechanism limits scalability, motivating the development of linear attention approaches that reformulate attention computation through kernel-based feature mappings with linear complexity. However, existing linear attention models often suffer from significant performance degradation due to overly simplistic feature maps, which restrict their practical applicability. In particular, current feature map designs face two fundamental limitations: the non-negativity constraint that discards negative correlations, and the use of overly smooth kernels that increase the entropy of attention weights, thereby weakening discriminability. To address these challenges, we propose Polarity-Aware Linear Attention with Channel-wise Spikiness (PolaFormer++), which is a unified framework that captures complete query–key interactions while preserving the spiky property as in Softmax attention with a flexible channel-wise feature map.

Method

In this paper, we present a novel polarity-aware linear attention mechanism, which accurately captures all query-key interactions in inner-product. Our method incorporates a dual learnable spiky function, that dynamically rescales the magnitudes of same and opposite-signed components, effectively reducing entropy of the attention weight in linear attention.we propose the polarity-aware linear attention mechanism that explicitly models both same-signed and opposite-signed query-key interactions, ensuring comprehensive coverage of relational information. Furthermore, to restore the spiky properties of attention maps, we prove that the existence of a class of element-wise functions (with positive first and second derivatives) can reduce entropy in the attention distribution. Finally, we employ a learnable power function for rescaling, allowing strong and weak attention signals to be effectively separated.

PolaFormer: Polarity-aware Linear Attention

Polarity-aware attention mechanism separates query-key pairs based on their polarity, computing their interactions independently. The attention weights are calculated as follows: To address this information loss, our polarity-aware attention mechanism separates query-key pairs based on their polarity, computing their interactions independently. The attention weights are calculated as follows:

To avoid the pitfalls of subtractive operations, we instead introduce a Polarity Gating Matrix $\mathbf{G}$, a learnable polarity-aware gating matrix to weight the contributions of same-signed and opposite-signed query-key similarities and approximate the subtractive operation in Equation. As shown in Figure, there is a clear negative correlation and value discrepancy between the weights learned in $\mathbf{G}^{s}$ and $\mathbf{G}^{o}$

Polarity-aware Feature Map with Channel-wise Spikiness

Considering that different channels contribute unequally to feature representation, we introduce a channel-wise temperature vector $\mathbf{p} \in \mathbb{R}^d$ into the exponential mapping function. This enables each channel to adjust its own degree of sharpness, transforming the uniform mapping across channels into adaptive, channel-dependent mappings. Furthermore, the contribution of each channel varies not only within a single polarity flow but also between the same-polar and opposite-polar flows. To capture this difference, we assign a dedicated temperature vector to each flow, resulting in the final polarity-aware feature map with channel-wise spikiness (PaCS).

Dependencies

Python 3.9
PyTorch == 1.11.0
torchvision == 0.12.0
numpy
timm == 0.4.12
einops
yacs

Data preparation

The ImageNet dataset should be prepared as follows:

$ tree data
imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Train Models from Scratch

To train our model on ImageNet from scratch, see pretrain.sh and run:

bash pretrain.sh

Results

Model	Params	Flops	Acc
b0	7M	0.7G	76.9%
b1	9M	1.0G	79.1%
b2	16M	2.7G	82.8%
b3	27M	5.1G	84.3%
b4	50M	11G	85.4%
b5	97M	18G	85.6%

Reference

[1] Meng, Weikang, Yadan Luo, Xin Li, Dongmei Jiang, and Zheng Zhang. "PolaFormer: Polarity-aware Linear Attention for Vision Transformers." In The Thirteenth International Conference on Learning Representations.

Acknowledgements

This code is developed on the top of PolaFormer.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
polaformer++		polaformer++
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolaFormer++: Polarity-aware Linear Attention with Channel-wise Spikiness

**Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Zheng Zhang^*, Heng Tao Shen

🚀 Welcome to the repo of PolaFormer++!

This repo contains the official PyTorch code for PolaFormer++**, the next version of PolaFormer[1].

Introduction

Motivation

Method

Dependencies

Data preparation

Train Models from Scratch

Results

Reference

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PolaFormer++: Polarity-aware Linear Attention with Channel-wise Spikiness

Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Zheng Zhang*, Heng Tao Shen 🚀 Welcome to the repo of PolaFormer++! This repo contains the official PyTorch code for PolaFormer++, the next version of PolaFormer[1].

Introduction

Motivation

Method

Dependencies

Data preparation

Train Models from Scratch

Results

Reference

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

**Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Zheng Zhang^*, Heng Tao Shen

🚀 Welcome to the repo of PolaFormer++!

This repo contains the official PyTorch code for PolaFormer++**, the next version of PolaFormer[1].

Packages