Log inSign up
Dinghuai Zhang 张鼎怀
xAI
967 posts
user avatar
Dinghuai Zhang 张鼎怀
xAI
@zdhnarsil
coding RL bigrun @xAI. Prev: @MSFTResearch / @Apple MLR / FAIR Labs @MetaAI, PhD at @Mila_Quebec, math undergraduate at @PKU1898.
zdhnarsil.github.io
Joined May 2014
1,811
Following
5,303
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Feb 26, 2023
    I don't understand why "RLHF" even needs RL? The reward function is a learned neural network and thus white-box. This means we could simply use straight through estimater (or Gumbel trick) to obtain a much better gradient. (context: my understanding is from InstructGPT paper)
    299K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Jul 11, 2023
    Life update: ecstatic to announce that I have been rejected from all internship applications and will spend the summer wherever I am 🥲😂.
    68K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Sep 10, 2025
    Thinking machines' first blog mentions and compares with our proposed truncated importance sampling (fengyao.notion.site/off-policy-rl) to achieve on-policy RL and mitigate the mismatch brought by inference engine👀
    user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Sep 10, 2025
    Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!
    53K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Apr 8, 2022
    Is this against ICML code?? How should we deal with it... #ICML2022
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Mar 4, 2023
    Hey there
    43K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Dec 20, 2022
    I am now officially a PhD candidate 🔥! A huge thank to all collaborators 🥳.
    23K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Feb 18, 2023
    Relate a lot... during my application season, I got rejected by almost all schools (mostly US ones) despite having two first author papers. Sadly, that's just what happens to unprivileged international students who has no connections 🤦🏻‍♂️.
    user avatar
    Pengpeng Xiao
    @pengpeng_xiao
    Feb 13, 2023
    My jaws keep dropping as I go through 70 PhD applicant files. People w/ 2 coauthored papers & an interesting solo writing sample don’t even make it to the top 10 in my pile. The level of knowledge, research experience & passion these kids bring to the table is just remarkable! 🤩
    GIF
    55K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Aug 10, 2025
    After discussion with @thjashin, results from "Diffusion Beats Autoregressive in Data-Constrained Settings" look like an exploit of the AR model's overfitting. Without overfitting, there seems no hope for discrete diffusion to outperform AR; see the 10B token plot for example.
    user avatar
    Jinjie Ni
    @NiJinjie
    Aug 9, 2025
    Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens
    46K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Jul 23, 2024
    Literally 5 out of 10 generative modeling researchers I have met recently are working on discrete diffusion / matching 🕵️ What happened lol
    33K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Mar 17, 2023
    Our #ICML2023 workshop proposal "Structured Probabilistic Inference & Generative Modeling" has been accepted 🎉. We can't wait to engage in insightful discussions with experts in probabilistic ML and other areas at the beautiful Hawaii 🌴🏖️. Check🔍: spigmworkshop.github.io
    42K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Jan 8, 2025
    What is the difference between the "process reward model" and the value function (in RL / control)?
    28K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Jan 21, 2023
    It is lucky to have Four works accepted by #ICLR2023. Here are a list of them: 1. Latent State Marginalization as a Low-cost Approach for Improving Exploration arxiv.org/abs/2210.00999 My first RL work! A trial to involve (more) structured probabilistic inference into control.
    arXiv logo
    arxiv.org
    Latent State Marginalization as a Low-cost Approach for Improving...
    While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective,...
    23K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Sep 19, 2025
    Finally, we scale GFlowNets to 32B param LLMs on reasoning tasks. Kudos to @zhu_xuekai 👏 My two cents: (a) On-policy is important for RL performance (this relates to our previous FlashRL effort) (b) Length normalization in logp calculation is critical for numerical stability
    user avatar
    AK
    @_akhaliq
    Sep 19, 2025
    FlowRL Matching Reward Distributions for LLM Reasoning
    12K
  • user avatar
    Dinghuai Zhang 张鼎怀
    xAI
    @zdhnarsil
    Oct 20, 2022
    I made an (awesome!) github list about GFlowNets, including tutorials, papers and other useful resources. Hope it could be a starting point for people who want to learn about this topic. If you like it or find it useful, feel free to star🌟 / share it!
    GitHub - zdhNarsil/Awesome-GFlowNets: A curated list of resources about generative flow networks...
    From github.com