Aman Arora (@amaarora) / X

Aman Arora

4,226 posts

Aman Arora

@amaarora

Building | Learning | Sharing | Previously Lead AI Engineer at RelevanceAI, AI Engineer @ Weights&Biases

Sydney, New South Wales

amaarora.github.io

Born September 26

Joined June 2014

Pinned
Aman Arora
@amaarora
Sep 22, 2025
🧵 Most modern LLMs like Qwen, DeepSeek & gpt-oss use YaRN to extend context from 4K→128K tokens. But what led to YaRN? Today I'm proud and excited to share a comprehensive resource into the evolution of positional embeddings such as APE, RoPE, YaRN & variants👇 1/n
2.3K
Aman Arora
@amaarora
Jul 10, 2024
I have primarily switched to Claude 3.5 Sonnet and hardly use GPT-4. Anybody else?
355K
Aman Arora
@amaarora
Feb 19, 2020
1/ After weeks of learning, I am proud to share - "The Annotated GPT-2" ladies and gentleman! In this post, I re-implement OpenAI's GPT-2 in PyTorch using @huggingface source code and try to explain all the magic that goes on inside the model. amaarora.github.io/2020/02/18/ann…
Aman Arora
@amaarora
Aug 2, 2021
I've been working on Object Detection for the past few weeks - and I am proud to announce "The Annotated DETR" !! amaarora.github.io/2021/07/26/ann… In this post, I try to explain all the magic that goes on inside the architecture. 1/n
Aman Arora
@amaarora
Jan 13, 2021
After days and hours of learning, I am very excited to share my latest blog post "The EfficientDet Architecture in PyTorch"! amaarora.github.io/2021/01/13/eff… In this post, I reference @wightmanr's source code and try to explain all the magic that goes on inside the network. 1/n
Aman Arora
@amaarora
May 2, 2024
What is the currently the best way to extract JSON from unstructured text using open source models by passing in a Pydantic schema? So far I have been looking into: 1. Guidance (github.com/guidance-ai/gu…) 2. Instructor (github.com/jxnl/instructor) 3. DSPy (github.com/stanfordnlp/ds…)
52K
Aman Arora
@amaarora
Nov 2, 2019
Investing time in @fastdotai is one of the best investments I have ever made. To continue to learn, I am starting a new series #CodeFirst where I will digging deep into the source code. This builds on top of @jeremyphoward code walkthrus. medium.com/@aman.arora021…
Aman Arora
@amaarora
Mar 15, 2021
Very excited to share my latest blog post on Optimizers called `"Adam" and friends`! amaarora.github.io/2021/03/13/opt… In this blog post we are going to re-implement SGD, Momentum, RMSprop & Adam from scratch and also compare performance with PyTorch's implementation. 1/
Aman Arora
@amaarora
Apr 12, 2021
I’m excited to share that I’ve joined @wandb! This means - more paper summaries, more research, more community events, more paper reading groups, more @fastdotai study groups, more open source contributions, more fun. :)
Aman Arora
@amaarora
Oct 30, 2019
1/ Not only is @fastdotai great for building deep learning models, it is also an excellent place to learn! By reading 21 pages of cs231n.github.io/convolutional-… resource mentioned in the pets lesson of V2 bit.ly/34dUNtS, I had several AHA moments! Such as,
Aman Arora
@amaarora
Jul 8, 2024
Excited to share a new blog post on Gemma 2 that goes into the details of: Grouped Query Attention, Sliding Window Attention, Rotary Position Embeddings (RoPE), Logit soft-capping & model-merging. **All with easy to follow PyTorch implementations!** 1/N
31K
Aman Arora
@amaarora
May 4, 2021
Super excited to present my latest blog post on ResNet-RS - "Revisiting ResNets: Improved Training and Scaling Strategies". bit.ly/2QT3yIU I also share code implementation in PyTorch using TIMM & more! 1/3
Aman Arora
@amaarora
Jul 2, 2024
Trust me when I tell you that the below code implements Grouped Query Attention (GQA), Multi Head Attention (MHA) & Multi Query Attention (MQA). There is no magic to it. Paper (GQA): arxiv.org/abs/2305.13245 Implementation adapted from: github.com/meta-llama/lla…
23K
Aman Arora
@amaarora
Dec 25, 2019
I am not sure if I should be scared or happy - with Uber's latest Plug & Play Language Model (arxiv.org/abs/1912.02164) it is now possible to drive LM's activations (such as GPT-2) and generate text with a specific sentiment on a specific topic. Is this dangerous? Time will tell.