Boris Dayma 🖍️ (@borisdayma) / X

Boris Dayma 🖍️

2,312 posts

Boris Dayma 🖍️

@borisdayma

🖍️ Founder of Craiyon 🥑 Author of dalle-mini

Joined February 2012

Boris Dayma 🖍️
@borisdayma
Jun 18, 2022
I'm now part of the Machine Learning Google Developer Experts Program 🎉
Boris Dayma 🖍️
@borisdayma
Apr 21, 2022
I've been comparing a lot of transformer variants on large models (400M params): Post/Pre-LN, DeepNet, NormFormers, Swin v2, GLU variants, RMSNorm, Sandwich LN, with GELU, Swish, SmeLU… More than 2,000h of total training time on TPU v3's 😯 Here are my findings 🤓
Boris Dayma 🖍️
@borisdayma
Jun 1, 2022
Time to talk about the biggest mistake I made while training DALLE-Mega 😥
Boris Dayma 🖍️
@borisdayma
May 30, 2023
When your model suddenly becomes conscious
125K
Boris Dayma 🖍️
@borisdayma
Jul 30, 2021
DALL·E mini is now available 🥳🥑 Generate images from any text prompt! huggingface.co/spaces/flax-co…
Boris Dayma 🖍️
@borisdayma
Apr 7, 2023
📉 "A Recipe for Training Large Models" 👉 Report: wandb.ai/craiyon/report… I've been working for a while on this guide, sharing practical recommendations with my simple recipe for training models 🧑‍🍳
wandb.ai
A Recipe for Training Large Models
Practical advice and tips for training large machine learning models. Made by Boris Dayma using Weights & Biases
128K
Boris Dayma 🖍️
@borisdayma
Jun 14, 2022
We now have a screenshot button 🔥
Boris Dayma 🖍️
@borisdayma
May 27, 2020
I've been working on #huggingtweets, a fun project to generate tweets based on your favorite twitter account using @huggingface. You can fine-tune a neural network and log the predictions automatically into @wandb. The demo runs in 2-3mn: colab.research.google.com/github/borisda…
Boris Dayma 🖍️
@borisdayma
Apr 19, 2022
Finally took the time to read DALLE 2 paper. Process: - text to CLIP image (no need to go through CLIP text) - CLIP image to pixels Here are my notes and how I could apply it to dalle-mini 👇
Boris Dayma 🖍️
@borisdayma
Jul 26, 2022
The vision transformer repo is very interesting! Love how the models are written, super clean ❤️
GitHub - google-research/vision_transformer
From github.com
Boris Dayma 🖍️
@borisdayma
Apr 11, 2022
Small dalle-mini trained for 9 days 🥑
Boris Dayma 🖍️
@borisdayma
Nov 5, 2023
I think people underestimate how hard it is to train a large model like GPT-3 and up. Lots of challenges arise when reaching billions parameters, let alone 10B+ params (data management, training stability, parallelism...). Only a few have succeeded so far and the recipe is not
137K
Boris Dayma 🖍️
@borisdayma
Mar 17, 2024
Few comments on Grok-1 code release in JAX! github.com/xai-org/grok Looking quickly: - model nicely written - partition rules for sharding follow the old style of t5x - they used haiku but it wouldn't be too hard to update to flax - they use shard_map on the MoE layers for
github.com
GitHub - xai-org/grok-1: Grok open release
Grok open release. Contribute to xai-org/grok-1 development by creating an account on GitHub.
44K
Boris Dayma 🖍️
@borisdayma
Feb 4, 2021
Just finished a complete tutorial on optimizing @huggingface models with @wandb No extra line of code required 🥳 wandb.me/hf