All Blogs

Published on: April 17, 2026

Math Behind Gradient Descent

In this blog, we will learn about the math behind gradient descent with a step-by-step numeric example.

Published on: April 15, 2026

Decoding Vision Transformer (ViT)

In this blog, we will learn about the Vision Transformer (ViT) by decoding how it splits an image into patches, turns those patches into tokens, and processes them with a transformer to classify the image.

Published on: April 13, 2026

Feed-Forward Networks in LLMs

In this blog, we will learn about Feed-Forward Networks in LLMs - understanding what they are, how they work inside the Transformer architecture, why every Transformer layer needs one, and what role they play in making Large Language Models so powerful.

Published on: April 11, 2026

Decoding Flash Attention in LLMs

In this blog, we will learn about Flash Attention by decoding it piece by piece - understanding why standard attention is slow, what makes Flash Attention fast, how it uses GPU memory cleverly, and why it is used in almost every modern Large Language Model (LLM).

Published on: April 9, 2026

Mixture of Experts Explained

In this blog, we will learn about the Mixture of Experts (MoE) architecture - understanding what experts are, how the router picks them, why MoE makes large models faster and cheaper, and why it powers many of today''s most powerful Large Language Models (LLMs).

Published on: April 7, 2026

Decoding Transformer Architecture

In this blog, we will learn about the Transformer architecture by decoding it piece by piece - understanding what each component does, how they work together, and why this architecture powers every modern Large Language Model (LLM)

Published on: April 6, 2026

Math Behind Backpropagation

In this blog, we will learn about the math behind backpropagation in neural networks.

Published on: April 5, 2026

Math behind √dₖ Scaling Factor in Attention

In this blog, we will learn about why we scale the dot product attention by √dₖ in the Transformer architecture with a step-by-step numeric example.

Published on: April 3, 2026

Math behind Attention - Q, K, and V

In this blog, we will learn about the math behind Attention - Query(Q), Key(K), and Value(V) with a step-by-step numeric example.