We built 200k-GPU clusters;
We scaled up & curated higher-quality data;
We scaled compute by 100x;
We developed training & test-time recipes;
We made everything RL native;
We stabilized infrastructure and speeded up;
That's how you turn RL into the pre-training scale.
Yet I am
Joined December 2017
- Proud to announce our paper "Functional Variational BNNs" arxiv.org/abs/1903.05779. Here we introduce functional variational inference, which enables us to specify structured priors and perform inference in function space. Gif shows BNN predictions under a Periodic prior.
GIF - Excited to share our HVGP @ICML2021 : 1) discrete Fourier transform meets kernels 2) decomposing a kernel into an orthogonal sum of kernels 3) a scalable SVGP to use more inducing points 4) being applicable to common kernels: RBF, Matern, poly, ... arxiv.org/abs/2106.05992
- Replying to @trunghltHard to say how many new things came up everyday to make everything work.
- I don't think that the scaling law is over, or no more high-quality data, or diminishing returns. The ultimate intelligence exists right here in all the data we have. What intelligence you get depends on how you learn from data: * Build a library of books * Create a search
- Replying to @algobakerYou take the model trained with maximal compute and add more compute at test time.
- Our ICML paper "Differentiable Compositional Kernel Learning for Gaussian Processes" is now open sourced in github.com/ssydasheng/Neu…, along with GPflow-Slim github.com/ssydasheng/GPf…, our customized GPflow with Tensorflow-style usage.
- If you are releasing sth called 3.5 and then change to 4, there is a reason.
- I am proud to share our latest paper: Reward-aware Preference Optimization (RPO): A Unified Mathematical Framework for Model Alignment! 🚀📄 [Link: arxiv.org/pdf/2502.00203] The rapid evolution of alignment algorithms—each with different objectives, training setups, and response
- Stay tuned for Grok4.
- Are kernels methods gonna work in high-dimensional problems ?
- Exciting work with @Guodzh, Chaoqi Wang, @zengwenyuan1995, Jiaman Li and @RogerGrosse. Check it on ICML 2018 @icmlconfNeural Kernel Networks: a differentiable framework for compositional kernel learning. Like the Automatic Statistician, but trainable with gradient-based optimization rather than discrete search. By @ssydasheng et al. arxiv.org/abs/1806.04326






