user avatar
Vincent Sitzmann
@vincesitzmann
Building AI that learns by interacting with the world. Associate Professor @ MIT, leading the Scene Representation Group (scenerepresentations.org).
Cambridge, Massachusetts
Joined February 2016
Posts
  • Pinned
    user avatar
    Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n) davidcharatan.com/millivid/#
    00:00
  • user avatar
    In personal news, I’m thrilled to announce that I’ll be joining @MIT as tenure-track assistant professor in July 2022! My lab will investigate neural scene representations, inverse graphics, neural rendering, and their applications in vision, graphics, robotics, and AI! (1/n)
  • user avatar
    Excited to share our work on "Implicit Neural Representations with Periodic Activations" vsitzmann.github.io/siren We show how to fit complex signals, such as room-scale SDFs, video, & audio, and supervise implicit reps via their gradients to solve boundary value problems! (1/n)
    00:00
  • user avatar
    We released the code for SIREN! vsitzmann.github.io/siren We also wrote a comprehensive Colab notebook with a no-frills implementation that reproduces image, audio, and poisson experiments, and explores initialization- and shift-invariance properties!
  • user avatar
    Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap! cameronosmith.github.io/flowmap/ IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods. 1/n
    00:00
  • user avatar
    Introducing “Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation”! yilundu.github.io/ndf/ (w/ video!) NDFs are an object representation for robotic manipulation enabling imitation of pick-and-place tasks with pose generalization guarantees (1/n)
    00:00
  • user avatar
    Implicit neural representations have recently gotten a lot of attention. I have compiled a reading list that I give students to get started in this area, inspired by the awesome-computer-vision list with extra commentary & notes. Check it out!
  • user avatar
    Introducing "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"! vsitzmann.github.io/lfns (w/ video!) LFNs are the first fully implicit neural scene representation with real-time rendering, without post-processing / hybrid data-structures! (1/n)
    00:00
  • user avatar
    I am hiring graduate students for my new lab at MIT, where I will start as faculty in July 2022! If you want to push what's possible with neural scene representations & inverse graphics please apply under: gradapply.mit.edu/eecs/apply/log… Deadline is Dec 15th!
  • user avatar
    Introducing Diffusion Forcing, a new way of training sequence generative models that unifies next-token prediction (think LLM) and full-sequence diffusion (think video diffusion models)! I’m super excited about this - it has a number of unique skills! (1/n)
    Introducing Diffusion Forcing, which unifies next-token prediction (eg LLMs) and full-seq. diffusion (eg SORA)! It offers improved performance & new sampling strategies in vision and robotics, such as stable, infinite video generation, better diffusion planning, and more! (1/8)
    00:00
  • user avatar
    Introducing Neural Jacobian Fields, robot 3D kinematic models learned only from vision! They can model & control robots from just a single RGB camera, even those w/ intractable kinematics & no embedded sensors such as soft, 3D-printed pneumatic hands! sizhe-li.github.io/publication/ne… 1/n
    00:00
  • user avatar
    Introducing “FlowCam: Training Generalizable 3D Radiance Fields w/o Camera Poses via Pixel-Aligned Scene Flow”! We train a generalizable 3D scene representation self-supervised on datasets of raw videos, without any pre-computed camera poses or SFM! cameronosmith.github.io/flowcam 1/n
    00:00
  • user avatar
    Introducing “Diffusion with Forward Models”, 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗵𝗮𝘁 𝗰𝗮𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲 𝗱𝗶𝘃𝗲𝗿𝘀𝗲, 𝗿𝗲𝗮𝗹 𝟯𝗗 𝘀𝗰𝗲𝗻𝗲𝘀 𝗳𝗿𝗼𝗺 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗶𝗺𝗮𝗴𝗲, 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝘄𝗶𝘁𝗵 𝗶𝗺𝗮𝗴𝗲𝘀 𝘄/𝗼 𝗮𝗻𝘆 𝟯𝗗 𝗱𝗮𝘁𝗮! …ffusion-with-forward-models.github.io 1/n
    00:00
  • user avatar
    NeRFs will transform computer graphics. But we need to be able to edit them! In “Decomposing NeRF for Editing via Feature Field Distillation” we use Image and Image/Language foundation models for easy, query-based editing via language- and patch queries! pfnet-research.github.io/distilled-feat…
    00:00