Nimit Kalra (@qw3rtman) / X

Nimit Kalra

218 posts

Nimit Kalra

@qw3rtman

Incoming PhD student. Visiting researcher with @MicahGoldblum (self-play, RL, reasoning, world models). Prev: @HaizeLabs @Citadel @UTAustin

off-policy

Joined October 2011

Nimit Kalra
@qw3rtman
May 18, 2025
Verdict at @NousResearch RL hackathon! Your calibrated and low-variance LLM-as-a-judge is a reward model 🙈
28K
Nimit Kalra
@qw3rtman
Mar 9, 2025
we're looking for a rockstar research eng @haizelabs! if you're interested in training tons of models and thinking about adversarial robustness for real-world deployed AI systems, DM me or apply below :)
16K
Nimit Kalra
@qw3rtman
Jun 26, 2025
qwen RL has felt icky recently, but these authors get llama RL to match
Zengzhi Wang
@SinclairWang1
Jun 26, 2025
What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?
9.2K
Nimit Kalra
@qw3rtman
Mar 22, 2025
The more RL I do, the less I believe in evolution
9.7K
Nimit Kalra
@qw3rtman
May 28, 2025
Excited to discuss "SFT Memorizes, RL Generalizes" tomorrow at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb! We'll also explore a broader theme — "what does RL actually learn?", guided by some related works from the past week.
7K
Nimit Kalra
@qw3rtman
May 27, 2025
We modified DeepSeek's recent Self-Principled Critique Tuning paper and bootstrapped a family of super tiny generalist reward models in < 1 day on a single A100 GPU. By proposing instance-specific rubrics at inference time, j1-micro (1.7B) and j1-nano (0.6B) punch well above
5.7K
Nimit Kalra
@qw3rtman
May 4, 2025
awful day to be an llm
Leonard Tang
@leonardtang_
May 4, 2025
EVALS EVALS EVALS Core Research @AutinMitra
5.3K
Nimit Kalra
@qw3rtman
Jun 26, 2025
Discussing "Mind the Gap" tonight at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with
Nimit Kalra
@qw3rtman
Jun 7, 2025
Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in arxiv.org/abs/2412.02674 is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning
9.8K
Nimit Kalra
@qw3rtman
Apr 2, 2025
Replying to @Purring_Lynx
rate limits too low for any real prod use cases tho 🙄
16K
Nimit Kalra
@qw3rtman
Jun 29, 2025
think it was @jxmnop who said that science is about generating artifacts. inspired me to really focus on this this past week, starting with some internal eng tools and paper summaries... grinding out a couple more researchy things for the next couple weeks :) super excited to
3.4K
Nimit Kalra
@qw3rtman
Jul 16, 2025
Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models
3.9K
Nimit Kalra
@qw3rtman
Mar 27, 2021
Replying to @rakyll
Picked GPL for one of my first open-source projects and really learned this lesson the hard way
Nimit Kalra
@qw3rtman
Mar 14, 2025
What tools are people using these days to search for relevant citations, e.g., papers that actually benchmark against a particular work? Google Scholar first page is usually surveys/prior work sections, which are somewhat useless for tracing the lineage of an approach
4.7K
Nimit Kalra
@qw3rtman
May 30, 2025
Great discussion tonight at @haizelabs HQ about the many many different definitions of generalization / “out of distribution” and which ones we actually care about in practice. + a special shoutout to @marklxu1 for the Joe’s pizza 🤤
mark xu
@marklxu1
May 30, 2025
thursday night pizza + papers in nyc! thanks to those who came out!! @leonardtang_ @qw3rtman @willccbb
1.6K