I worked at Intel on Larrabee applications in 2007. Then I went to NVIDIA to work on ML in 2008. So I was there at both places at that time and I can say:
NVIDIA's dominance didn't come from luck. It came from vision and execution. Which Intel lacked.
Bryan Catanzaro
1,799 posts
VP, Applied Deep Learning Research @ NVIDIA
Joined February 2011
- Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the
- A long time ago, back before DLSS was in many games (and when my hair was shorter and less gray), I went to Nintendo HQ to show them an early prototype of DLSS 2, in the hopes that a future Switch console would use DLSS. I'm so proud that the Switch 2 will be DLSS powered!
- Jensen Huang is an intensely driven visionary. Working at NVIDIA is exciting and fast paced because he sets the tone. I think his story should be more widely known - in my mind he is just as much a tech titan as Steve Jobs, Bill Gates, or Mark Zuckerberg.
- Here’s how we trained an 8.3B parameter GPT-2. We alternate row- and column- partitioning in the Transformer in order to remove synchronization and use hybrid model/data parallelism. 15 PFlops sustained on 512 GPUs. Details and code: nv-adlr.github.io/MegatronLM
- Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…
- I didn't actually convince Jensen, instead I just explained deep learning to him. He instantly formed his own conviction and pivoted NVIDIA to be an AI company. It was inspiring to watch and I still sometimes can't believe I got to be there.
- Language models do just keep scaling! Today we’re announcing Megatron-Turing NLG: a 530B parameter language model. Joint work from @nvidia and @Microsoft. Trained using Megatron and DeepSpeed on DGX SuperPod.
- Neural rendering takes its next step with DLSS 3.0 on Ada! In addition to DL-powered superresolution, it uses optical flow, motion vectors, and DL to generate entire frames. 7 out of 8 pixels being rendered with DLSS3 come from Neural rendering. #GTC22
- People have been asking for a fast open source CUDA matrix multiplication for ages. My colleagues just released one! In many cases, it’s just as fast as CUBLAS. Works on Volta’s tensor cores, too.
- Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help others build their own models * Sized for inference on 8 NVIDIA H100 GPUs * Competitive across many tasks
- A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887
- DGX GH200: NVLink *between nodes* creates a system with 256 Grace CPUs (each with 480GB of LPDDR5) and 256 Hopper GPUs (each with 96GB of HBM3). Each GPU can directly access the memory of any other GPU or CPU at 900 Gbps. Can't wait to train some models!








