user avatar
Sitan Chen
@sitanch
assistant professor of computer science @hseas, learning theorist, 🎹
Joined April 2020
Posts
  • Pinned
    user avatar
    Excited about this new work where we dig into the role of token order in masked diffusions! MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/
  • user avatar
    What are noisy intermediate-scale quantum devices good for? In a new paper arxiv.org/abs/2210.07234 joint with @JordanCotler, @RobertHuangHY, and @jerryzli, we define and study a new complexity class, NISQ, that captures the computational power of these devices 🧵 (1/n)
  • user avatar
    Excited to share something I've been working on over the last year, joint with @jerryzli, Yuanzhi Li, and Anru Zhang! arxiv.org/abs/2204.04209 We give provably efficient algorithms for learning a rich family of "pushforward distributions" inspired by generative models. 1/n
    Learning Polynomial Transformations
  • user avatar
    New paper up, joint w/ Sinho Chewi, @jerryzli, Yuanzhi Li, @AdilSlm, and Anru Zhang arxiv.org/abs/2209.11215 We prove diffusion models can efficiently sample from practically any distribution, even highly non-log-concave ones, given reasonably accurate score estimation (1/n)
  • user avatar
    Proving optimization guarantees for transformers is hard, even if just training on seq2seq pairs for which we know some small transformer achieves zero test loss. In practice gradient descent just works. In theory, it's open to prove *any* efficient algorithm succeeds 🥲 1/
  • user avatar
    Guidance is one of the key ingredients behind diffusion models' impressive generation capabilities. But what does it actually do? In new work led by @mle_muthu + Khashayar and joint w/ @oldheneel + Jianfeng, we rigorously pin down its behavior in a simple but rich setting 🧵1/
  • user avatar
    Nice thread on one of my favorite classical physics concepts, the diffraction limit! While Rayleigh's criterion is widely viewed as just a rule of thumb (even by Rayleigh 🧐), Ankur Moitra and I proved it can be regarded as a phase transition for mixture model learning 1/
    What is resolution in an image? It is not the number of pixels. Here’s the classical Rayleigh’s criterion taught in basic physics: 1/5
  • user avatar
    Excited to announce @JordanCotler, @RobertHuangHY, @jerryzli, and I are organizing a workshop at FOCS on quantum learning ⚛️! There have been a ton of exciting works in this rapidly growing area the last few years, many coming from fruitful interactions between physics+TCS 1/
  • user avatar
    Excited to announce new work, joint with Kerem Dayi, on training dynamics of LoRA beyond the kernel regime! tl;dr fine-tuning naturally interpolates between NTK and feature learning, and we prove it can behave genuinely differently from either 1/
  • user avatar
    Jerry Li kicking off our quantum learning workshop at FOCS ‘24 with a TCS-friendly crash course! See the webpage for details on how to tune in remotely: jerryzli.github.io/focs24-worksho…
  • user avatar
    To appear at ICML ’23 arxiv.org/abs/2303.03384 We obtain non-asymptotic convergence bounds for *deterministic* diffusion model samplers, as well as a new operational interpretation for the probability flow ODE 🏖 1/7
  • user avatar
    Given copies of unknown quantum state ρ, can we quantify how far it is from being classically simulable ⚛️🎲? Better yet, can we learn the closest approximation by such a state? In new work, we give the first polynomial time algorithm for this problem 1/
  • user avatar
    Turns out the complexity of predicting unknown quantum evolutions is tied to a cute puzzle: given a spectrally bounded linear combo of Paulis, how big can the L_p norm of the coefficients be? 🧐 Check out Robert’s awesome thread on our recent work with @preskill on this problem!
    🤖 Can a machine *efficiently* learn and predict quantum dynamics with arbitrarily high complexity (e.g, exponentially high)? In our new paper arxiv.org/abs/2210.14894 with @sitanch and @preskill, we give an ML algorithm and prove that it accomplishes this wild task (📜1/7)
  • user avatar
    Check out this awesome thread by Marvin on our recent work giving theory to understand "critical windows" in diffusion models, a phase transition whereby key features are determined in a narrow window during sampling! Fun blend of math and Stable Diffusion experiments 1/
    In diffusion models, the features of generated images emerge in narrow time intervals of the reverse process 😮—can we provably characterize these “critical windows” in which features are decided? W/ @sitanch we describe critical windows for a rich family of distributions. (1/n)