Shengyang Sun (@ssydasheng) / X

Shengyang Sun

203 posts

Shengyang Sun

@ssydasheng

Build AGI @xAI | Prev. @NVIDIA (Leading Nemotron-340B) & @AMAZON | PhD @UofT ; B.E.@Tsinghua

Joined December 2017

Shengyang Sun
@ssydasheng
Jul 10, 2025
We built 200k-GPU clusters; We scaled up & curated higher-quality data; We scaled compute by 100x; We developed training & test-time recipes; We made everything RL native; We stabilized infrastructure and speeded up; That's how you turn RL into the pre-training scale. Yet I am
185K
Shengyang Sun
@ssydasheng
Mar 15, 2019
Proud to announce our paper "Functional Variational BNNs" arxiv.org/abs/1903.05779. Here we introduce functional variational inference, which enables us to specify structured priors and perform inference in function space. Gif shows BNN predictions under a Periodic prior.
GIF
Shengyang Sun
@ssydasheng
Jul 20, 2025
Replying to @nsaphra
Unbelievably disrespectful speculation.
17K
Shengyang Sun
@ssydasheng
Jun 14, 2021
Excited to share our HVGP @ICML2021 : 1) discrete Fourier transform meets kernels 2) decomposing a kernel into an orthogonal sum of kernels 3) a scalable SVGP to use more inducing points 4) being applicable to common kernels: RBF, Matern, poly, ... arxiv.org/abs/2106.05992
Shengyang Sun
@ssydasheng
Jul 10, 2025
Replying to @trunghlt
Hard to say how many new things came up everyday to make everything work.
3.6K
Shengyang Sun
@ssydasheng
Aug 8, 2025
I don't think that the scaling law is over, or no more high-quality data, or diminishing returns. The ultimate intelligence exists right here in all the data we have. What intelligence you get depends on how you learn from data: * Build a library of books * Create a search
3K
Shengyang Sun
@ssydasheng
Jul 10, 2025
Replying to @algobaker
You take the model trained with maximal compute and add more compute at test time.
5K
Shengyang Sun
@ssydasheng
Jul 3, 2018
Our ICML paper "Differentiable Compositional Kernel Learning for Gaussian Processes" is now open sourced in github.com/ssydasheng/Neu…, along with GPflow-Slim github.com/ssydasheng/GPf…, our customized GPflow with Tensorflow-style usage.
GitHub - ssydasheng/Neural-Kernel-Network: Code for "Differentiable Compositional Kernel Learning...
From github.com
Shengyang Sun
@ssydasheng
Jul 10, 2025
Replying to @ekellbuch
Stay tuned.
2.6K
Shengyang Sun
@ssydasheng
Jul 10, 2025
If you are releasing sth called 3.5 and then change to 4, there is a reason.
2.7K
Shengyang Sun
@ssydasheng
Feb 5, 2025
I am proud to share our latest paper: Reward-aware Preference Optimization (RPO): A Unified Mathematical Framework for Model Alignment! 🚀📄 [Link: arxiv.org/pdf/2502.00203] The rapid evolution of alignment algorithms—each with different objectives, training setups, and response
2.8K
Shengyang Sun
@ssydasheng
Jun 28, 2025
Stay tuned for Grok4.
3.3K
Shengyang Sun
@ssydasheng
Jul 26, 2021
Are kernels methods gonna work in high-dimensional problems ?
Shengyang Sun
@ssydasheng
Jun 13, 2018
Exciting work with @Guodzh, Chaoqi Wang, @zengwenyuan1995, Jiaman Li and @RogerGrosse. Check it on ICML 2018 @icmlconf
Roger Grosse
@RogerGrosse
Jun 13, 2018
Neural Kernel Networks: a differentiable framework for compositional kernel learning. Like the Automatic Statistician, but trainable with gradient-based optimization rather than discrete search. By @ssydasheng et al. arxiv.org/abs/1806.04326