user avatar
MBZ
@babaeizadeh
Senior Staff Research Scientist at @GoogleDeepMind Gemini Omni, Veo3, Veo2, Veo, Phenaki
Mountain View, CA
Joined June 2009
Posts
  • Pinned
    user avatar
    how many edits is enough? 😅 #Gemini #Omni Flash see thread for editing steps
    00:00
    Made with AI
  • user avatar
    #Veo3 further blurs the lines between reality and imagination with audio, stronger text adherence, and richer visual details.
    00:00
  • user avatar
    How good is #Veo2? Let's look at some samples. "a sitcom tv show about potatoes" #Veo
    00:00
  • user avatar
    #Veo2 excels at generating videos that feel remarkably "real". Dynamic living backgrounds + fluid motion + finely rendered details in faces, hands, and bodies, that creates truly natural looking videos. Here is some videos with "mundane life" as prompt. no cherry picking. #veo
    00:00
  • user avatar
    Is predicting future rewards sufficient for achieving success in visual model-based reinforcement learning? We experimentally demonstrate that this is usually *not* the case in the online settings and the key is to predict future images too. 1/5
    GIF
  • user avatar
    Replying to @babaeizadeh
    many people asked for Anime. but this is a potato thread. so here we go "anime style footage of two potatoes having a sword fight. cinematic, fastpaced with a lot of shotcuts"
    00:00
  • user avatar
    It's hard to promote your work when your alma mater is under siege, but here we go. Introducing Phenaki, a model that can generate minutes of videos given a story. Hopefully, it will be used for some good somewhere.
    1/ Today we are excited to introduce Phenaki: phenaki.github.io, short-link-to-paper, a model for generating videos from text, with prompts that can change over time, and that is able to generate videos that can be as long as multiple minutes!
    GIF
  • user avatar
    Replying to @babaeizadeh
    "Cinematic fast paced shot of a muscle sport car drifting around a corner. It's evening. The headlights of the car is cutting through the heavy fog. The license plate is "Veo". The car is moving so fast the the pedestrians are blurry. The driver is a potato."
    00:00
  • user avatar
    Replying to @babaeizadeh
    Since chippings are viral now! 😅
    00:00
  • user avatar
    Blog post on our latest experiments on visual model based reinforcement learning arxiv.org/abs/2012.04603 One of the most stable and flexible libraries that I ever worked on github.com/google-researc… with @msaffar3 @danijarh @harinidkannan @chelseabfinn @svlevine @doomie
    Introducing the World Models Library, an open-source, platform-agnostic suite of tasks and tools for examination of world model design and performance in visual model-based reinforcement learning. Learn more and grab the code at goo.gle/36EXY1t
    GIF
  • user avatar
    Replying to @babaeizadeh
    "a high energy music video. the singer is a potato and the dancers are other vegetables."
    00:00
  • user avatar
    Replying to @babaeizadeh
    "a documentary about the famous potato shaped casino being built in Las Vegas blvd"
    00:00
  • user avatar
    Introducing FitVid, a variational video prediction model, which is capable of severe overfitting on the common video prediction benchmarks -- while having similar parameter count as the current sota models. with @msaffar3 @SurajNair_1 @svlevine @chelseabfinn @doomie
    GIF
  • user avatar
    Replying to @babaeizadeh
    "a training montage of a potato training hard for Olympics."
    00:00