user avatar
Yi Tay
@YiTayML
research scientist @googledeepmind โœจโ™Š, model co-lead/captain of gemini deepthink imo gold medal ๐Ÿฅ‡, opinions are my own.
mixture-of-locations
Joined October 2016
Posts
  • Pinned
    user avatar
    Happy to share that the @GoogleDeepMind Gemini team is starting a new research team in Singapore! This new team will be focused on advanced reasoning, LLM/RL and improving bleeding edge SOTA models such as Gemini, Gemini Deep Think and beyond. ๐Ÿ”ฅ This team will be led by yours
  • user avatar
    Long overdue but here's a new blogpost on training LLMs in the wilderness from the ground up ๐Ÿ˜„๐Ÿง In this blog post, I discuss: 1. Experiences in procuring compute & variance in different compute providers. Our biggest finding/surprise is that variance is super high and it's
  • user avatar
    Gemini 3! This is our most intelligent model that brings any idea to life. ๐Ÿ˜ป This is the best model in the world, by a crazy wide margin! Aside from a huge increase across the absolutely everything, look at its coding capabilities and quality of aesthetics and fidelity.
    00:00
  • user avatar
    New open source Flan-UL2 20B checkpoints :) - Truly open source ๐Ÿ˜Ž No forms! ๐Ÿคญ Apache license ๐Ÿ”ฅ - Best OS model on MMLU/Big-Bench hard ๐Ÿคฉ - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog:
  • user avatar
    Our IMO gold model is not just an "experimental reasoning" model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! ๐Ÿ”ฅ
    So happy to see this incredible achievement. Huge congrats to @lmthang, @quocleix, @YiTayML and the IMO team on the result. This was a great collaboration across teams to build a general Gemini DeepThink model that can also get gold at IMO.
  • user avatar
    Personal / life update: I have returned to @GoogleDeepMind to work on AI & LLM research. It was an exciting 1.5 years at @RekaAILabs and I truly learned a lot from this pretty novel experience. I wrote a short note about my experiences and transition on my personal blog here
  • user avatar
    Itโ€™s been a short 6 months since I left Google Brain and it has been a uniquely challenging yet interesting experience to build everything from the ground up in an entirely new environment (e.g., the wilderness) Today, weโ€™re excited to announce the first version of the
    We are excited to announce the 1st version of our multimodal assistant, Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution ๐Ÿช„. Yasa-1 can understand text, images, videos, sounds & more! ๐Ÿš€ Check out more details below๐Ÿ‘‡
    00:00
  • user avatar
    Hot take ๐Ÿ”ฅ: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models ๐Ÿ˜€ Take a look at this table from this new InstructEval paper: arxiv.org/abs/2306.04757. Some thoughts/observations: 1.
  • user avatar
    "Scaling laws vs Model Architectures" from @GoogleAI. Lessons: - Not all arch scale the same way. - Vanilla Transformer does pretty well ๐Ÿ˜€ - Touching the attention too much is "dangerous". ๐Ÿ˜” - Perf at base may not translate to large+ scale. pdf: arxiv.org/abs/2207.10551
  • user avatar
    Over the past 3.3 years at Google, I have been blessed with so many wonderful friendships and experiences. I have grown so much. However, itโ€™s time to move on to a new adventure! I wrote a blogpost about my wonderful experience here:
  • user avatar
    It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! ๐Ÿ’ช One of the goals weโ€™ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a
    Meet Reka Core, our best and most capable multimodal language model yet. ๐Ÿ”ฎ Itโ€™s been a busy few months training this model and we are glad to finally ship it! ๐Ÿ’ช Core has a lot of capabilities, and one of them is understanding video --- letโ€™s see what Core thinks of the 3 body
    00:00
  • user avatar
    Weโ€™re coming out of stealth with $58M in funding to build generative models and advance AI research at @RekaAILabs ๐Ÿ”ฅ๐Ÿš€ Language models and their multimodal counterparts are already ubiquitous and massively impactful everywhere. That said, we are still at the beginning of this
    Reka funding announcement to build generative models
  • user avatar
    Inspired by the dizzying number of efficient Transformers ("x-formers") models that are coming out lately, we wrote a survey paper to organize all this information. Check it out at arxiv.org/abs/2009.06732. Joint work with @m__dehghani @dara_bahri and @metzlerd. @GoogleAI ๐Ÿ˜€๐Ÿ˜ƒ
  • user avatar
    Excited to share our latest work at @GoogleAI on "Transformer Memory as a Differentiable Search Index"! TL;DR? We parameterize a search system with only a single Transformer model ๐Ÿ˜Ž. Everything in the corpus is encoded in the model! ๐Ÿ™Œ Paper: arxiv.org/abs/2202.06991