user avatar
Aman Arora
@amaarora
Building | Learning | Sharing | Previously Lead AI Engineer at RelevanceAI, AI Engineer @ Weights&Biases
Sydney, New South Wales
Born September 26
Joined June 2014
Posts
  • Pinned
    user avatar
    🧵 Most modern LLMs like Qwen, DeepSeek & gpt-oss use YaRN to extend context from 4K→128K tokens. But what led to YaRN? Today I'm proud and excited to share a comprehensive resource into the evolution of positional embeddings such as APE, RoPE, YaRN & variants👇 1/n
  • user avatar
    I have primarily switched to Claude 3.5 Sonnet and hardly use GPT-4. Anybody else?
  • user avatar
    1/ After weeks of learning, I am proud to share - "The Annotated GPT-2" ladies and gentleman! In this post, I re-implement OpenAI's GPT-2 in PyTorch using @huggingface source code and try to explain all the magic that goes on inside the model. amaarora.github.io/2020/02/18/ann…
  • user avatar
    I've been working on Object Detection for the past few weeks - and I am proud to announce "The Annotated DETR" !! amaarora.github.io/2021/07/26/ann… In this post, I try to explain all the magic that goes on inside the architecture. 1/n
  • user avatar
    After days and hours of learning, I am very excited to share my latest blog post "The EfficientDet Architecture in PyTorch"! amaarora.github.io/2021/01/13/eff… In this post, I reference @wightmanr's source code and try to explain all the magic that goes on inside the network. 1/n
  • user avatar
    What is the currently the best way to extract JSON from unstructured text using open source models by passing in a Pydantic schema? So far I have been looking into: 1. Guidance (github.com/guidance-ai/gu…) 2. Instructor (github.com/jxnl/instructor) 3. DSPy (github.com/stanfordnlp/ds…)
  • user avatar
    Investing time in @fastdotai is one of the best investments I have ever made. To continue to learn, I am starting a new series #CodeFirst where I will digging deep into the source code. This builds on top of @jeremyphoward code walkthrus. medium.com/@aman.arora021…
  • user avatar
    Very excited to share my latest blog post on Optimizers called `"Adam" and friends`! amaarora.github.io/2021/03/13/opt… In this blog post we are going to re-implement SGD, Momentum, RMSprop & Adam from scratch and also compare performance with PyTorch's implementation. 1/
  • user avatar
    I’m excited to share that I’ve joined @wandb! This means - more paper summaries, more research, more community events, more paper reading groups, more @fastdotai study groups, more open source contributions, more fun. :)
  • user avatar
    1/ Not only is @fastdotai great for building deep learning models, it is also an excellent place to learn! By reading 21 pages of cs231n.github.io/convolutional-… resource mentioned in the pets lesson of V2 bit.ly/34dUNtS, I had several AHA moments! Such as,
  • user avatar
    Excited to share a new blog post on Gemma 2 that goes into the details of: Grouped Query Attention, Sliding Window Attention, Rotary Position Embeddings (RoPE), Logit soft-capping & model-merging. **All with easy to follow PyTorch implementations!** 1/N
  • user avatar
    Super excited to present my latest blog post on ResNet-RS - "Revisiting ResNets: Improved Training and Scaling Strategies". bit.ly/2QT3yIU I also share code implementation in PyTorch using TIMM & more! 1/3
  • user avatar
    Trust me when I tell you that the below code implements Grouped Query Attention (GQA), Multi Head Attention (MHA) & Multi Query Attention (MQA). There is no magic to it. Paper (GQA): arxiv.org/abs/2305.13245 Implementation adapted from: github.com/meta-llama/lla…
  • user avatar
    I am not sure if I should be scared or happy - with Uber's latest Plug & Play Language Model (arxiv.org/abs/1912.02164) it is now possible to drive LM's activations (such as GPT-2) and generate text with a specific sentiment on a specific topic. Is this dangerous? Time will tell.