Rohith Kuditipudi (@rckpudi) / X

Rohith Kuditipudi

76 posts

Rohith Kuditipudi

@rckpudi

PhD student @StanfordAILab advised by John Duchi and @percyliang

web.stanford.edu/~rohithk/

Joined March 2020

Rohith Kuditipudi
@rckpudi
Jul 4, 2022
Interpolation (train to zero loss) often does well in high dim, yet may still be undesirable (e.g., security/privacy concerns). So is interpolation necessary for optimal generalization? In our COLT paper, we surprisingly find the answer is yes! arxiv.org/abs/2202.09889 (1/n)
arxiv.org
Memorize to Generalize: on the Necessity of Interpolation in High...
We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training...
Rohith Kuditipudi
@rckpudi
Jul 31, 2023
Watermarking enables detecting AI-generated content, but existing strategies distort model output or aren't robust to edits. We offer a strategy for LMs that’s distortion-free (up to a max budget) *and* robust. arxiv.org/abs/2307.15593 w/ @jwthickstun @tatsu_hashimoto @percyliang
66K
Rohith Kuditipudi
@rckpudi
Jul 31, 2023
Replying to @rckpudi
Along with the paper, we've released a blog post featuring a public demo of our watermark (with code), using some examples of watermarked text we generated from LLaMA-7B. Try to break our watermark by editing the text! crfm.stanford.edu/2023/07/30/wat…
1.1K
Rohith Kuditipudi
@rckpudi
Jul 31, 2023
Replying to @rckpudi
We validate our strategies with OPT-1.3B, LLaMA-7B, and Alpaca-7B. Our best watermark uses exp-min sampling (like Aaronson) to generate text from the key sequence and is detectable (p < 0.01) from 35 tokens even when 50% of the sequence is corrupted with random insertions!
915
Rohith Kuditipudi
@rckpudi
Jul 24, 2024
what a wonderful, slick result! has me wondering what the all-time record is for shortest phd thesis...
Aaron Roth
@Aaroth
Jul 7, 2024
A quick thread on a short (3 page) paper, giving a simple algorithm that makes predictions guaranteeing 2*Sqrt{T} "Distance to calibration" against an adversary. The algorithm and proof are so simple I can describe it in thread. Joint with Eshwar, @natalie_collina, and Mirah:
833
Rohith Kuditipudi
@rckpudi
Jul 4, 2022
Replying to @rckpudi
We take inspiration from a wonderful line of work originated by @vitalyFM and others, who construct certain combinatorial, heavy-tailed settings in which various notions of memorization (i.e., stability-based and information-theoretic) are provably necessary to learn. (6/n)
Rohith Kuditipudi
@rckpudi
Sep 20, 2022
Replying to @StanfordHAI @_jasonwei and @RishiBommasani
link seems down
Rohith Kuditipudi
@rckpudi
Jul 31, 2023
Replying to @rckpudi
We generate text from a fixed "watermark key sequence". The detector, who knows the full sequence, can align it to a text to verify the watermark. Until we reuse part of the sequence, the text is indistinguishable from regular text (i.e., distortion-free).
872
Rohith Kuditipudi
@rckpudi
Jul 31, 2023
Replying to @rckpudi
Other watermarking strategies (e.g., Kirchenbauer et al.; Aaronson) hash the previous k-1 tokens to determine the next token. Larger k makes the bias toward certain k-grams less noticeable but hurts robustness (replacing 1/k tokens breaks detection). We avoid this trade-off.
632
Rohith Kuditipudi
@rckpudi
May 19, 2025
super excited for what's to come!
Percy Liang
@percyliang
May 19, 2025
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
851
Rohith Kuditipudi
@rckpudi
Aug 19, 2025
great work!
lily clifford
@lilyjclifford
Aug 19, 2025
🚀 Arcana v2 is here. Rime’s next-gen TTS makes voice AI sound truly human. More languages. More realism. More deployment options. 🧵👇
00:00
476
Rohith Kuditipudi
@rckpudi
Jul 4, 2022
Replying to @rckpudi
We quantify the cost of not exactly fitting the training data via an optimization problem over learners: min. test error s.t. train error > ε^2. Even for ε quadratically smaller than the label noise variance σ^2, any feasible learner must suffer increased test error. (4/n)
Rohith Kuditipudi
@rckpudi
Jul 4, 2022
Replying to @rckpudi
There are lots of exciting open questions. Even for adjacent settings such as kernel regression and linear classification, we think obtaining similar results will require new approaches. (9/n, n=9)
Rohith Kuditipudi
@rckpudi
Sep 18, 2022
Replying to @DimitrisPapail and @bneyshabur
@DimitrisPapail we have a fairly simple counterexample (existence of disconnected global minima) that applies to two-layer nets of any width, though unclear re: reachability via SGD (see Section 5 of arxiv.org/abs/1906.06247)