We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?
Introducing Log-Linear Attention with:
- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
PhD Student @MIT_CSAIL | Past: @togethercompute @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.
Joined August 2016
























