Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n)
davidcharatan.com/millivid/#
In personal news, I’m thrilled to announce that I’ll be joining @MIT as tenure-track assistant professor in July 2022! My lab will investigate neural scene representations, inverse graphics, neural rendering, and their applications in vision, graphics, robotics, and AI! (1/n)
Excited to share our work on "Implicit Neural Representations with Periodic Activations"
vsitzmann.github.io/siren
We show how to fit complex signals, such as room-scale SDFs, video, & audio, and supervise implicit reps via their gradients to solve boundary value problems! (1/n)
We released the code for SIREN!
vsitzmann.github.io/siren
We also wrote a comprehensive Colab notebook with a no-frills implementation that reproduces image, audio, and poisson experiments, and explores initialization- and shift-invariance properties!
Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap!
cameronosmith.github.io/flowmap/
IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods.
1/n
Introducing “Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation”!
yilundu.github.io/ndf/ (w/ video!)
NDFs are an object representation for robotic manipulation enabling imitation of pick-and-place tasks with pose generalization guarantees (1/n)
Implicit neural representations have recently gotten a lot of attention. I have compiled a reading list that I give students to get started in this area, inspired by the awesome-computer-vision list with extra commentary & notes. Check it out!
Introducing "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"!
vsitzmann.github.io/lfns (w/ video!)
LFNs are the first fully implicit neural scene representation with real-time rendering, without post-processing / hybrid data-structures! (1/n)
I am hiring graduate students for my new lab at MIT, where I will start as faculty in July 2022! If you want to push what's possible with neural scene representations & inverse graphics please apply under:
gradapply.mit.edu/eecs/apply/log…
Deadline is Dec 15th!
Introducing Diffusion Forcing, a new way of training sequence generative models that unifies next-token prediction (think LLM) and full-sequence diffusion (think video diffusion models)! I’m super excited about this - it has a number of unique skills! (1/n)
Introducing Diffusion Forcing, which unifies next-token prediction (eg LLMs) and full-seq. diffusion (eg SORA)! It offers improved performance & new sampling strategies in vision and robotics, such as stable, infinite video generation, better diffusion planning, and more! (1/8)
Introducing Neural Jacobian Fields, robot 3D kinematic models learned only from vision! They can model & control robots from just a single RGB camera, even those w/ intractable kinematics & no embedded sensors such as soft, 3D-printed pneumatic hands!
sizhe-li.github.io/publication/ne…
1/n
Introducing “FlowCam: Training Generalizable 3D Radiance Fields w/o Camera Poses via Pixel-Aligned Scene Flow”! We train a generalizable 3D scene representation self-supervised on datasets of raw videos, without any pre-computed camera poses or SFM!
cameronosmith.github.io/flowcam
1/n
NeRFs will transform computer graphics. But we need to be able to edit them! In “Decomposing NeRF for Editing via Feature Field Distillation” we use Image and Image/Language foundation models for easy, query-based editing via language- and patch queries!
pfnet-research.github.io/distilled-feat…