What can unsupervised representation learning learn from deep RL? As it turned out, we can learn representations (unsupervised) by making plans!
Check out our new work at
Sora from @OpenAI is super impressive, but how consistent are the geometries? We ran this through our fast 3DGS pipeline, and here are some of the early results. This is a reconstruction 👉 1/n
Finally came out 🔥 Humanoid code name "Sonnie" (from @DavidSFWilson) @RoboticsSciSys
- @Tesla is probably going to copy their design with Optimus
- Lots of good design decisions
- An agile controller trained w/ RL in #isaacgym by @nvidia (not shown)
The company was founded in
MIT researchers taught a robot dog to perceive a 3D world using Neural Volumetric Memory (NVM).
This technique lets the bot climb stairs, step over gaps & run autonomously w/a single neural network.
Paper: bit.ly/3LNYWfy
Video: bit.ly/3LSciaA
w/@UCSanDiego
We "bake" SE(3) equivariance into the robot's world representation by training. Seeing how well this worked, in real-life, with just onboard camera and compute, is just SURREAL.
A lot more to be done, in closing the gap between vision + control, on dynamic and contact-rich
The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control!
Our #CVPR2023 Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!
rchalyang.github.io/NVM/
In the other videos, the scenes sure look nice, but the photogrammetry pipeline fails because there isn't enough overlap and baseline separation between the camera poses. Does this mean sora failed to model geometry? 3/n
Just got back to my desk. Here is what it looks like when you move around. We noticed that the 3D reconstruction using our photogrammetry pipeline only works well when it has a circular camera path like in this Big Sur video. In the other videos 2/n