Review paper on Foundation Models for Decision Making: arxiv.org/abs/2303.04129
Foundation models can characterize various components of decision making, such as states (S), behaviors (A), dynamics (T), task specifiers (R), through generative modeling or representation learning.
Video generation can serve as world models and embodied planning tools, but they must be grounded in the physical world. Check out:
VideoAgent for self-improving video generation using feedback from VLMs and action executions.
Paper: arxiv.org/abs/2410.10076
Code:
Checkout UniMat -- a unified representation of materials that enables scaling of diffusion models to millions of stable crystal structures.
Website: unified-materials.github.io
Paper: arxiv.org/abs/2311.09235
Consider joining our team at Google DeepMind to work on foundation models for decision making,ย e.g., foundation model alignment, reasoning, planning, simulation,ย andย optimization with foundation models.
Our team (w/Dale, @daibond_alpha, @mengjiao_yang + others) at Google DeepMind is looking to hire. If you are interested in foundation models+decision making, and making real-world impact through Gemini and cloud solutions, please consider applying through
boards.greenhouse.io/deepmind/jobs/โฆ
Video generation will revolutionize decision making in the physical world like how language models have changed the digital world.
Interested in the implications of video generation models like UniSim and Sora? Check out our position paper:
Source for this figure: arxiv.org/abs/2205.10816.
Procedure Cloning is a simple but powerful idea: Teach the model not just what action to take but also the procedure for how to find this action.
Original Tweet: x.com/mengjiao_yang/โฆ
My Bet: Strawberry is algorithm distillation/procedural cloning. Everyone right now is coming up with ways to distill System 2 into System 1, but that will always be limited. We need to train the model to run the algorithms, not just outputs (and post-train with RL of course).
As video foundation models reach billions of parameters, how to adapt them to task-specific settings (e.g., animation, robotics) without access to the model weights becomes a pressing issue.
We introduce Video Adapter:
arxiv.org/abs/2306.01872video-adapter.github.io