<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>About on George Grigorev Blog</title><link>https://ggrigorev.me/</link><description>Recent content in About on George Grigorev Blog</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 31 Oct 2025 22:00:00 +0000</lastBuildDate><atom:link href="https://ggrigorev.me/index.xml" rel="self" type="application/rss+xml"/><item><title>Introduction to parallelism in PyTorch</title><link>https://ggrigorev.me/posts/introduction-to-parallelism/</link><pubDate>Fri, 31 Oct 2025 22:00:00 +0000</pubDate><guid>https://ggrigorev.me/posts/introduction-to-parallelism/</guid><description>&lt;p&gt;Training large models inevitably requires a solid understanding of parallelism techniques. In this post, I&amp;rsquo;ll give a practical, in-depth overview of the most common approaches — DDP, FSDP, and TP — and how they&amp;rsquo;re actually used in real PyTorch training setups.&lt;/p&gt;
&lt;p&gt;This article was inspired by the excellent “How to Scale Your Model” &lt;a href="https://jax-ml.github.io/scaling-book/index"&gt;blog series&lt;/a&gt;. While that series is clear and insightful, I felt it was missing some hands-on perspective and real-world lessons from someone who has trained models in the wild.&lt;/p&gt;</description></item><item><title>Tokenization from first principles</title><link>https://ggrigorev.me/posts/tokenizer-superbpe/</link><pubDate>Tue, 07 Oct 2025 00:00:00 +0000</pubDate><guid>https://ggrigorev.me/posts/tokenizer-superbpe/</guid><description>Byte-level BPE from first principles: what matters for speed and quality, how to implement it cleanly, and why a SuperBPE variant can lift sample efficiency.</description></item></channel></rss>