user avatar
Jiaxin Shi
@thjashin
Research @Meta MSL TBD | past @GoogleDeepMind @Stanford @MSRNE @VectorInst @RIKEN_AIP_EN @Tsinghua_Uni. Building probabilistic & algorithmic models for learning
New York, NY
Joined February 2016
  • Pinned
    user avatar
    We have released code for our paper "Simplified and Generalized Masked Diffusion for Discrete Data" — SOTA discrete diffusion results — beating prior diffusion language models & exceeding AR likelihood on pixel-level image modeling. Try it out:
  • user avatar
    Let me introduce Neural Eigenmap, a structured deep representation where features are ordered by importance. Neural eigenmap is the outputs of neural approx to eigenfunctions. We show when the eigenfunctions are derived from positive relations in a self-supervised setup, (1/6)
  • user avatar
    How to design a next-gen convolutional sequence model? Use wavelet theory! Meet #MultiresConv: strong performance, yet extremely simple to implement--15 lines of code with standard conv/linear operations, NO specialized init, complex numbers, or FFT! github.com/thjashin/multi…
  • user avatar
    We are hiring a student researcher at Google DeepMind to work on fundamental problems in discrete generative modeling! Examples of our recent work: masked diffusion: arxiv.org/abs/2406.04329 learning-order AR: arxiv.org/abs/2503.05979 If you find this interesting, please send an
  • user avatar
    Autoregressive models are too restrictive by forcing a fixed generation order, while masked diffusion is wasteful as it fits all possible orders. Can our model dynamically decide the next position to generate based on context? Learn more in our ICML paper arxiv.org/abs/2503.05979
  • user avatar
    Thrilled to share our new gradient estimator for discrete distributions won #NeurIPS2022 Outstanding Paper Award! Our estimator requires no extra evaluation of the target function, adapts itself online & achieves substantially lower variance/better train objs than SOTA estimators
  • user avatar
    Proud to announce SOLVE-GP, new work on scalable variational Gaussian processes: arxiv.org/abs/1910.10596. Use more inducing points at a lower computational cost! More than 80% accuracy on CIFAR-10 using mini-batched deep conv GPs (rbf kernels), without any neural net components.
    GIF
  • user avatar
    Last year we (@ssydasheng @RogerGrosse ) had this crazy idea: Taking a neural network after training, we can view it as posterior approximation to a GP without even doing the Bayesian inference! We published the idea in this AABI symposium paper: openreview.net/pdf?id=NgqYp7s…
  • user avatar
    Discrete diffusion models made simple & competitive on both language and pixel-level image modeling! arxiv.org/abs/2406.04329 ✅New variational objective (integrate cross-entropy!) ✅Beating prior diffusion language models & matching best AR on pixel-level image modeling ...(1/n)
  • user avatar
    Mirror descent generalizes gradient descent to deal with constrained domain and non-Euclidean geometry. Check out our ICLR spotlight (poster at 9:30pm ET) showing how to do this for **sampling** — we develop a multi-particle mirror descent using Stein operators!
  • user avatar
    Interested in estimating the score (\nabla log p(x)) from samples? This unsupervised learning problem can be solved with nonparametric regression. We provide a unifying view of existing works and propose faster/more accurate estimators in high dimensions. arxiv.org/abs/2005.10099
  • user avatar
    Starting from this week I'll be a postdoctoral researcher at @MSFTResearch New England, currently working remotely due to things out of my control. I'm grateful for all the support during my job search and looking forward for the new journey :)
  • user avatar
    One thing we didn’t expect, but people seem to appreciate a lot about our MultiresConv paper arxiv.org/abs/2305.01638 is—Along developing our model, we provide a mathematical justification for WaveNet-style dilated convolutions through theory of wavelet. So Wavelet<->WaveNet.
  • user avatar
    Check out this survey on Stein's method in stats/ML: arxiv.org/abs/2105.03481 I'm recently learning a lot of the inspiration behind this wave of Stein's method from the amazing Lester Mackey. Definitely an exciting time to do research on this topic!