user avatar
Boris Dayma 🖍️
@borisdayma
🖍️ Founder of Craiyon 🥑 Author of dalle-mini
Joined February 2012
Posts
  • user avatar
    I'm now part of the Machine Learning Google Developer Experts Program 🎉
  • user avatar
    I've been comparing a lot of transformer variants on large models (400M params): Post/Pre-LN, DeepNet, NormFormers, Swin v2, GLU variants, RMSNorm, Sandwich LN, with GELU, Swish, SmeLU… More than 2,000h of total training time on TPU v3's 😯 Here are my findings 🤓
  • user avatar
    Time to talk about the biggest mistake I made while training DALLE-Mega 😥
  • user avatar
    When your model suddenly becomes conscious
  • user avatar
    DALL·E mini is now available 🥳🥑 Generate images from any text prompt! huggingface.co/spaces/flax-co…
  • user avatar
    📉 "A Recipe for Training Large Models" 👉 Report: wandb.ai/craiyon/report… I've been working for a while on this guide, sharing practical recommendations with my simple recipe for training models 🧑‍🍳
  • user avatar
    We now have a screenshot button 🔥
  • user avatar
    I've been working on #huggingtweets, a fun project to generate tweets based on your favorite twitter account using @huggingface. You can fine-tune a neural network and log the predictions automatically into @wandb. The demo runs in 2-3mn: colab.research.google.com/github/borisda…
  • user avatar
    Finally took the time to read DALLE 2 paper. Process: - text to CLIP image (no need to go through CLIP text) - CLIP image to pixels Here are my notes and how I could apply it to dalle-mini 👇
  • user avatar
    The vision transformer repo is very interesting! Love how the models are written, super clean ❤️
  • user avatar
    Small dalle-mini trained for 9 days 🥑
  • user avatar
    I think people underestimate how hard it is to train a large model like GPT-3 and up. Lots of challenges arise when reaching billions parameters, let alone 10B+ params (data management, training stability, parallelism...). Only a few have succeeded so far and the recipe is not
  • user avatar
    Few comments on Grok-1 code release in JAX! github.com/xai-org/grok Looking quickly: - model nicely written - partition rules for sharding follow the old style of t5x - they used haiku but it wouldn't be too hard to update to flax - they use shard_map on the MoE layers for
  • user avatar
    Just finished a complete tutorial on optimizing @huggingface models with @wandb No extra line of code required 🥳 wandb.me/hf