Pinned
He He
154 posts
NLP researcher. Assistant Professor at NYU CS & CDS.
Joined December 2016
- Unbelievable. This quote is blatantly false and unnecessary for the argument. And she surely had expected the backlash with the patronizing NOTE. This is racism, not "cultural generalization". @NeurIPSConfMitigating racial bias from LLMs is a lot easier than removing it from humans! Canโt believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? ๐ก
- @kchonyc and I are hiring a post-doc. Come help us figure out how an agent can learn by reading manuals and watching videos! Looking for expertise in multimodal reasoning, few-shot learning, QA/dialogue. Get in touch or apply atย apply.interfolio.com/92494
- Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell๐โผ๏ธYour model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?๐ง We introduce TRACE๐ต, a method based on a simple premise: hacking is easier than solving the actual
- Have LLMs mastered deductive reasoning? Check out PrOntoQA-OOD, a synthetic dataset using a complete set of deduction rules. arxiv.org/abs/2305.15269 Stop by the poster on Wed at 10:45-12:45 and ask Abu Saparov all about reasoning (w or w/o LLMs)! #NeurIPS2023
- Automating AI research is bottlenecked by verification speed (running experiments takes time). Our new paper explores whether LLMs can tell which ideas will work before executing them, and they appear to have better research intuition than human researchers.Most promising-looking AI research ideas donโt pan out, but testing them burns through compute and labor. Can LMs predict idea success without running any experiments? We show that they do it better than human experts!
- Thanks @_jasonwei for a fantastic and timely lecture! We had a full house and half an hour discussion. Stay tuned for @hwchung27 's lecture on RLHF in two weeks (nyu-cs2590.github.io/spring2023/calโฆ)!I gave an invited lecture at New York University for @hhexiy's class! I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides: docs.google.com/presentation/dโฆnyu-cs2590.github.ioCalendarListing of course modules and topics.
- If you are interested in truthfulness/interpretability of LLMs, chat with @javirandor at #NeurIPS2023 !๐งต New paper: โPersonas as a Way to Model Truthfulness in Language Modelsโ We introduce empirical evidence suggesting LLMs may use โpersonasโ to model truthfulness and improve generalization. arxiv.org/abs/2310.18168
- Congratulations again @thtrieu_ ! Thanks for bringing me on this quest and can't wait to see the next rabbit you pull!
- @haizelabs is one of the few truly tackling the hard problem of LLM eval and oversight. Excited to support their mission!We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, humanโAI collaboration, and reasoning.
- Itโd be great if the ARR @ReviewAcl meta review provides two scores, one on significance of ideas/results and one on revisions needed. The two are kind of conflated now; what should be the score of a perfectly-executed, low-impact paper?
- New work on OOD detection with @uditarora09 & @WillHuang93! OODs are notoriously hard to define. We try to construct realistic pairs of ID/OOD sets and find that they reveal distinct failure modes of different detection methods.1/5 New paper @emnlpmeeting! โTypes of Out-of-distribution Texts and How to Detect Themโ with @WillHuang93 and @hhexiy: arxiv.org/abs/2109.06827. TL;DR: Our results call for an explicit definition of OOD examples when evaluating different detection methods.
- Check out Jiaxin's work on how RLHFed model excels at impressing humans, not the actual tasks!RLHF is a popular method. It makes your human eval score better and Elo rating ๐๐. But reallyโYour model might be โcheatingโ you! ๐๐ We show that LLMs can learn to mislead human evaluators via RLHF. ๐งตbelow


















