Pinned
I noticed that @OpenAI added learnable bias to attention logits before softmax. After softmax, they deleted the bias. This is similar to what I have done in my ICLR2025 paper: openreview.net/forum?id=78Nn4….
I used learnable key bias and set corresponding value bias zero. In this way,






