user avatar
Andy Zeng
@andyzengineer
Building robot foundation models @GeneralistAI. Prev @GoogleDeepMind, PhD @Princeton. One experiment away from magic. ✗DMs → email
Joined September 2017
Posts
  • Pinned
    user avatar
    GEN-1 is now available today to our Early Access Partners. Let’s build.
    Introducing GEN-1. Our latest milestone in scaling robot learning. We believe it to be the first general-purpose AI model to master simple physical tasks. 99% success rates, 3x faster speeds, adapts in real time to unexpected scenarios, w/ only 1 hour of robot data. More🧵👇
    00:00
  • user avatar
    With multiple foundation models “talking to each other”, we can combine commonsense across domains, to do multimodal tasks like zero-shot video Q&A or image captioning, no finetuning needed. Socratic Models: website + code: socraticmodels.github.io paper: arxiv.org/abs/2204.00598
    00:00
  • user avatar
    Join us next week at the CVPR Tutorial on Vision-Based Robot Learning! We’ll distribute Colabs that show you how to run Socratic Models for language-driven robot pick & place right in your browser (in person, or online!) sites.google.com/view/cvpr2022-…
    00:00
  • user avatar
    Can robots learn to pick up stuff and accurately toss them into bins outside its natural range? Check out our latest work, TossingBot! tossingbot.cs.princeton.edu w/ @SongShuran, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser #robotics #AI #research
    00:00
  • user avatar
    Code-writing LLMs are surprisingly good at 📝 writing reward functions for 🦾 MPC low-level control – providing a chat-like interface to teach robots to things like "stand up and moon-walk" 🐶 Read more about it here 👇 and @xf1280's 🧵 language-to-reward.github.io
    00:00
    00:09
    🤖Excited to share our project where we propose to use rewards represented in code as a flexible interface between LLMs and an optimization-based motion controller. website: language-to-reward.github.io Want to learn more about how we make a robot dog do moonwalk MJ style?🕺🕺
  • user avatar
    This is one-shot assembly: you show examples of what to build, and the robot just does it. (see original post: generalistai.com/blog) To share more on how this works, the robot is controlled in real time by a neural network that takes in video pixels and outputs 100Hz actions.
    00:00
  • user avatar
    Can robots 🤖 to navigate to sounds 🔊 they've heard? w/ audio-language 🔊✏️ foundation models, excited that we can now ask our helper robots to "go to where you heard coughing" Audio-Visual-Language Maps w/ @huang_chenguang @oier_mees @wolfram_burgard: avlmaps.github.io
    00:00
  • user avatar
    We built PaLM-E 🌴🤖 one of the largest multimodal language models to date, trained end-to-end on robot data. Images, text, state inputs, neural scene embeddings – you name it. And it's fantastic on robots. Check out Danny's thread 👇
    What happens when we train the largest vision-language model and add in robot experiences? The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website: palm-e.github.io
    00:00
  • user avatar
    Still crazy to me that we can prompt LLMs (GPT-3 or PaLM) with a bunch of numbers 📝 to discover and improve closed-loop policies that stabilize CartPole – entirely in-context w/o model finetuning Read more in @suvir_m's post 👇 and try out the code
    00:00
    01:34
    In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples: -extrapolating symbolic patterns -extending periodic motions -discovering simple policies (e.g. for CartPole) (1/8)
  • user avatar
    Excited to share "Visual Language Maps"! VLMaps fuse visual language model features into a dense 3D map for robot navigation from natural language instructions Website: vlmaps.github.io Led by the amazing @huang_chenguang w/ @oier_mees, @wolfram_burgard
    00:00
  • user avatar
    Tried ImageNet pre-training for your robot learning models only to find out it didn't help? Turns out, which dataset you use & which weights you transfer, matters a lot. Check out our blog post! yenchenlin.me/vision2action w/ @yen_chen_lin @SongShuran @phillip_isola Tsung-Yi Lin
    Check out new research into applying transfer learning to robotic manipulation. By leveraging pre-trained weights from computer vision models, it’s possible to greatly improve the training efficiency for robotic manipulation tasks. Learn all about it at bit.ly/39dmLZ7
  • user avatar
    Released @PyTorch code for “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” (works for robots in both sim & real). Happy hacking :) Code: github.com/andyzeng/visua… Paper: arxiv.org/pdf/1803.09956… Project: vpg.cs.princeton.edu
    GIF
  • user avatar
    To see emergent behaviors from low-level policies was a first for many of us on the team. They don't happen often enough yet, but it certainly feels like we're headed in the right direction. Reach out if you're interested in working together.
    Today we're excited to share a glimpse of what we're building at Generalist. As a first step towards our mission of making general-purpose robots a reality, we're pushing the frontiers of what end-to-end AI models can achieve in the real world. Here's a preview of our early
    00:00
  • user avatar
    Replying to @andyzengineer
    From recalling events, to contextual and temporal reasoning – prompting foundation models to engage in guided Socratic discussions enables a variety of new open-ended video Q&A capabilities.
    00:00