Pinned
🧵 Most modern LLMs like Qwen, DeepSeek & gpt-oss use YaRN to extend context from 4K→128K tokens.
But what led to YaRN?
Today I'm proud and excited to share a comprehensive resource into the evolution of positional embeddings such as APE, RoPE, YaRN & variants👇
1/n







