user avatar
Leandro von Werra
@lvwerra
Head of research @huggingface
Bern, Switzerland
Joined March 2019
  • Pinned
    user avatar
    The Ultra-Scale Playbook: Training LLMs on GPU Clusters Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! huggingface.co/spaces/nanotro…
  • user avatar
    73k GitHub stars for a PDF and a Readme
  • user avatar
    Jupyter Agents - LLMs running data analysis directly in a notebook! The agent can load data, execute code, plot results and following your guidance and ideas! A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!
    00:00
  • user avatar
    Evaluation is one of the most important aspects of ML but today’s evaluation landscape is scattered and undocumented which makes evaluation unnecessarily hard. For that reason we are excited to release 🤗 Evaluate! github.com/huggingface/ev… Let’s take a tour:
  • user avatar
    Introducing: ⚡️OlympicCoder⚡️ Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in! Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.
  • user avatar
    Did you know that you can train all Llama-2 models on your own data in just a few lines? The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT! Learn more: huggingface.co/docs/trl/main/… Full script: github.com/lvwerra/trl/bl…
  • user avatar
    Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases. Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
  • user avatar
    Our book "Natural Language Processing with Transformers: Building Language Applications with Hugging Face" can now be preordered! amazon.de/Natural-Langua… This thread gives an overview of what you can expect by summarizing the content of each chapter:
  • user avatar
    Excited to introduce: StackLlama🦙 An end-to-end tutorial for training Llama with RLHF on preference data such as the StackExchange questions! Blog: hf.co/blog/stackllama Demo: hf.co/spaces/trl-lib… Code: github.com/lvwerra/trl/tr… The resulting model is surprisingly fun!🧵
  • user avatar
    It finally arrived! 🎉 So I guess it is a real thing now. Thanks to everybody who ordered it. Because of all of you it is the #1 release on Amazon in NLP, #3 in ML&AI, and #4 in all of computer science! ❤️ transformersbook.com
  • user avatar
    Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in *real time*! hf.co/spaces/lvwerra…
    00:00
  • user avatar
    Can we create all the code for training GitHub CoPilot in a (looong) tweet thread? Yes, see how to train CodeParrot🦜, a large GPT-2 model for code, from scratch in this thread! Ready - go!
  • user avatar
    How do models like GPT-2 and BERT represent position of tokens? When visualizing their positional encodings I found an interesting pattern. A short thread:
  • user avatar
    solving problems using BERT that can be solved by a RegEx is another level of skill issue
    solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue