Bryan Catanzaro (@ctnzr) / X

Bryan Catanzaro

1,799 posts

Bryan Catanzaro

@ctnzr

VP, Applied Deep Learning Research @ NVIDIA

Joined February 2011

Bryan Catanzaro
@ctnzr
Dec 20, 2023
I worked at Intel on Larrabee applications in 2007. Then I went to NVIDIA to work on ML in 2008. So I was there at both places at that time and I can say: NVIDIA's dominance didn't come from luck. It came from vision and execution. Which Intel lacked.
HPC Guru
@HPC_Guru
Dec 20, 2023
Intel CEO laments @nvidia's 'extraordinarily lucky' #AI dominance, claims it coulda-woulda-shoulda have been @intel Things would be completely different if only Intel hadn't cancelled the #Larrabee #GPU pcgamer.com/intel-ceo-lame…
254K
Bryan Catanzaro
@ctnzr
Aug 18, 2025
Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the
276K
Bryan Catanzaro
@ctnzr
Apr 3, 2025
A long time ago, back before DLSS was in many games (and when my hair was shorter and less gray), I went to Nintendo HQ to show them an early prototype of DLSS 2, in the hopes that a future Switch console would use DLSS. I'm so proud that the Switch 2 will be DLSS powered!
71K
Bryan Catanzaro
@ctnzr
Jan 30, 2020
One of the best decisions we ever made @nvidia Applied Deep Learning Research was to standardize on @PyTorch for all our research. It has made us more productive and made our work more fun. Glad to see @OpenAI agrees!
OpenAI standardizes on PyTorch
From openai.com
Bryan Catanzaro
@ctnzr
Nov 27, 2023
Jensen Huang is an intensely driven visionary. Working at NVIDIA is exciting and fast paced because he sets the tone. I think his story should be more widely known - in my mind he is just as much a tech titan as Steve Jobs, Bill Gates, or Mark Zuckerberg.
How Jensen Huang’s Nvidia Is Powering the A.I. Revolution
From newyorker.com
137K
Bryan Catanzaro
@ctnzr
Aug 13, 2019
Here’s how we trained an 8.3B parameter GPT-2. We alternate row- and column- partitioning in the Transformer in order to remove synchronization and use hybrid model/data parallelism. 15 PFlops sustained on 512 GPUs. Details and code: nv-adlr.github.io/MegatronLM
Bryan Catanzaro
@ctnzr
Mar 21, 2025
Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…
201K
Bryan Catanzaro
@ctnzr
Sep 28, 2023
I didn't actually convince Jensen, instead I just explained deep learning to him. He instantly formed his own conviction and pivoted NVIDIA to be an AI company. It was inspiring to watch and I still sometimes can't believe I got to be there.
How Bryan Catanzaro jump-started Nvidia's AI Big Bang
From fastcompany.com
96K
Bryan Catanzaro
@ctnzr
Oct 11, 2021
Language models do just keep scaling! Today we’re announcing Megatron-Turing NLG: a 530B parameter language model. Joint work from @nvidia and @Microsoft. Trained using Megatron and DeepSpeed on DGX SuperPod.
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most...
From developer.nvidia.com
Bryan Catanzaro
@ctnzr
Sep 20, 2022
Neural rendering takes its next step with DLSS 3.0 on Ada! In addition to DL-powered superresolution, it uses optical flow, motion vectors, and DL to generate entire frames. 7 out of 8 pixels being rendered with DLSS3 come from Neural rendering. #GTC22
Bryan Catanzaro
@ctnzr
Dec 6, 2017
People have been asking for a fast open source CUDA matrix multiplication for ages. My colleagues just released one! In many cases, it’s just as fast as CUBLAS. Works on Volta’s tensor cores, too.
CUTLASS: Fast Linear Algebra in CUDA C++ | NVIDIA Technical Blog
From developer.nvidia.com
Bryan Catanzaro
@ctnzr
Jun 14, 2024
Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help others build their own models * Sized for inference on 8 NVIDIA H100 GPUs * Competitive across many tasks
109K
Bryan Catanzaro
@ctnzr
Jun 13, 2024
A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887
119K
Bryan Catanzaro
@ctnzr
May 29, 2023
DGX GH200: NVLink *between nodes* creates a system with 256 Grace CPUs (each with 480GB of LPDDR5) and 256 Hopper GPUs (each with 96GB of HBM3). Each GPU can directly access the memory of any other GPU or CPU at 900 Gbps. Can't wait to train some models!
Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System | NVIDIA Technical Blog
From developer.nvidia.com
83K