Interpolation (train to zero loss) often does well in high dim, yet may still be undesirable (e.g., security/privacy concerns). So is interpolation necessary for optimal generalization? In our COLT paper, we surprisingly find the answer is yes! arxiv.org/abs/2202.09889 (1/n)
Rohith Kuditipudi
76 posts
Joined March 2020
- Watermarking enables detecting AI-generated content, but existing strategies distort model output or aren't robust to edits. We offer a strategy for LMs that’s distortion-free (up to a max budget) *and* robust. arxiv.org/abs/2307.15593 w/ @jwthickstun @tatsu_hashimoto @percyliang
- Replying to @rckpudiAlong with the paper, we've released a blog post featuring a public demo of our watermark (with code), using some examples of watermarked text we generated from LLaMA-7B. Try to break our watermark by editing the text! crfm.stanford.edu/2023/07/30/wat…
- Replying to @rckpudiWe validate our strategies with OPT-1.3B, LLaMA-7B, and Alpaca-7B. Our best watermark uses exp-min sampling (like Aaronson) to generate text from the key sequence and is detectable (p < 0.01) from 35 tokens even when 50% of the sequence is corrupted with random insertions!
- what a wonderful, slick result! has me wondering what the all-time record is for shortest phd thesis...A quick thread on a short (3 page) paper, giving a simple algorithm that makes predictions guaranteeing 2*Sqrt{T} "Distance to calibration" against an adversary. The algorithm and proof are so simple I can describe it in thread. Joint with Eshwar, @natalie_collina, and Mirah:
- Replying to @StanfordHAI @_jasonwei and @RishiBommasanilink seems down
- Replying to @rckpudiWe generate text from a fixed "watermark key sequence". The detector, who knows the full sequence, can align it to a text to verify the watermark. Until we reuse part of the sequence, the text is indistinguishable from regular text (i.e., distortion-free).
- Replying to @rckpudiOther watermarking strategies (e.g., Kirchenbauer et al.; Aaronson) hash the previous k-1 tokens to determine the next token. Larger k makes the bias toward certain k-grams less noticeable but hurts robustness (replacing 1/k tokens breaks detection). We avoid this trade-off.
- super excited for what's to come!What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
- great work!🚀 Arcana v2 is here. Rime’s next-gen TTS makes voice AI sound truly human. More languages. More realism. More deployment options. 🧵👇
00:00 - Replying to @rckpudiWe quantify the cost of not exactly fitting the training data via an optimization problem over learners: min. test error s.t. train error > ε^2. Even for ε quadratically smaller than the label noise variance σ^2, any feasible learner must suffer increased test error. (4/n)
- Replying to @rckpudiThere are lots of exciting open questions. Even for adjacent settings such as kernel regression and linear classification, we think obtaining similar results will require new approaches. (9/n, n=9)
- Replying to @DimitrisPapail and @bneyshabur@DimitrisPapail we have a fairly simple counterexample (existence of disconnected global minima) that applies to two-layer nets of any width, though unclear re: reachability via SGD (see Section 5 of arxiv.org/abs/1906.06247)










