Jiaxin Shi (@thjashin) / X

Jiaxin Shi

632 posts

Jiaxin Shi

@thjashin

Research @Meta MSL TBD | past @GoogleDeepMind @Stanford @MSRNE @VectorInst @RIKEN_AIP_EN @Tsinghua_Uni. Building probabilistic & algorithmic models for learning

New York, NY

Joined February 2016

Pinned
Jiaxin Shi
@thjashin
Dec 11, 2024
We have released code for our paper "Simplified and Generalized Masked Diffusion for Discrete Data" — SOTA discrete diffusion results — beating prior diffusion language models & exceeding AR likelihood on pixel-level image modeling. Try it out:
GitHub - google-deepmind/md4: Official Jax Implementation of MD4 Masked Diffusion Models
From github.com
33K
Jiaxin Shi
@thjashin
Oct 31, 2022
Let me introduce Neural Eigenmap, a structured deep representation where features are ordered by importance. Neural eigenmap is the outputs of neural approx to eigenfunctions. We show when the eigenfunctions are derived from positive relations in a self-supervised setup, (1/6)
Jiaxin Shi
@thjashin
Jun 6, 2023
How to design a next-gen convolutional sequence model? Use wavelet theory! Meet #MultiresConv: strong performance, yet extremely simple to implement--15 lines of code with standard conv/linear operations, NO specialized init, complex numbers, or FFT! github.com/thjashin/multi…
64K
Jiaxin Shi
@thjashin
Apr 14, 2025
We are hiring a student researcher at Google DeepMind to work on fundamental problems in discrete generative modeling! Examples of our recent work: masked diffusion: arxiv.org/abs/2406.04329 learning-order AR: arxiv.org/abs/2503.05979 If you find this interesting, please send an
62K
Jiaxin Shi
@thjashin
Jul 15, 2025
Autoregressive models are too restrictive by forcing a fixed generation order, while masked diffusion is wasteful as it fits all possible orders. Can our model dynamically decide the next position to generate based on context? Learn more in our ICML paper arxiv.org/abs/2503.05979
25K
Jiaxin Shi
@thjashin
Nov 23, 2022
Thrilled to share our new gradient estimator for discrete distributions won #NeurIPS2022 Outstanding Paper Award! Our estimator requires no extra evaluation of the target function, adapts itself online & achieves substantially lower variance/better train objs than SOTA estimators
Jiaxin Shi
@thjashin
Mar 3, 2020
Proud to announce SOLVE-GP, new work on scalable variational Gaussian processes: arxiv.org/abs/1910.10596. Use more inducing points at a lower computational cost! More than 80% accuracy on CIFAR-10 using mini-batched deep conv GPs (rbf kernels), without any neural net components.
GIF
Jiaxin Shi
@thjashin
May 11, 2021
Last year we (@ssydasheng @RogerGrosse ) had this crazy idea: Taking a neural network after training, we can view it as posterior approximation to a GP without even doing the Bayesian inference! We published the idea in this AABI symposium paper: openreview.net/pdf?id=NgqYp7s…
Jiaxin Shi
@thjashin
Jun 12, 2024
Discrete diffusion models made simple & competitive on both language and pixel-level image modeling! arxiv.org/abs/2406.04329 ✅New variational objective (integrate cross-entropy!) ✅Beating prior diffusion language models & matching best AR on pixel-level image modeling ...(1/n)
34K
Jiaxin Shi
@thjashin
Apr 26, 2022
Mirror descent generalizes gradient descent to deal with constrained domain and non-Euclidean geometry. Check out our ICLR spotlight (poster at 9:30pm ET) showing how to do this for **sampling** — we develop a multi-particle mirror descent using Stein operators!
Jiaxin Shi
@thjashin
Jun 27, 2020
Interested in estimating the score (\nabla log p(x)) from samples? This unsupervised learning problem can be solved with nonparametric regression. We provide a unifying view of existing works and propose faster/more accurate estimators in high dimensions. arxiv.org/abs/2005.10099
Jiaxin Shi
@thjashin
Aug 26, 2020
Starting from this week I'll be a postdoctoral researcher at @MSFTResearch New England, currently working remotely due to things out of my control. I'm grateful for all the support during my job search and looking forward for the new journey :)
Jiaxin Shi
@thjashin
Jul 25, 2023
One thing we didn’t expect, but people seem to appreciate a lot about our MultiresConv paper arxiv.org/abs/2305.01638 is—Along developing our model, we provide a mathematical justification for WaveNet-style dilated convolutions through theory of wavelet. So Wavelet<->WaveNet.
21K
Jiaxin Shi
@thjashin
May 11, 2021
Check out this survey on Stein's method in stats/ML: arxiv.org/abs/2105.03481 I'm recently learning a lot of the inspiration behind this wave of Stein's method from the amazing Lester Mackey. Definitely an exciting time to do research on this topic!