Yanda Chen (@yanda_chen

Yanda Chen

106 posts

Yanda Chen

@yanda_chen_

Member of Technical Staff @AnthropicAI CodeRL/Alignment | PhD @ColumbiaCompSci | NLP & ML | Prev Intern @MSFTResearch, @AmazonScience

San Francisco, CA

Joined January 2019

Yanda Chen
@yanda_chen_
Apr 3, 2025
My first paper @AnthropicAI is out! We show that Chains-of-Thought often don’t reflect models’ true reasoning—posing challenges for safety monitoring. It’s been an incredible 6 months pushing the frontier toward safe AGI with brilliant colleagues. Huge thanks to the team! 🙏
Anthropic
@AnthropicAI
Apr 3, 2025
New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.
87K
Yanda Chen
@yanda_chen_
Sep 2, 2024
Life update: I’m excited to share that I’m joining the Alignment Science team at @AnthropicAI as a Member of Technical Staff/Research Scientist. I’ll be focusing on AI safety. Looking forward to it!
87K
Yanda Chen
@yanda_chen_
Sep 16, 2022
[1/6] Have you found that if you slightly change your prompts, your favorite language model outputs differently? Curious why and how to capitalize such sensitivity? Check out our work “On the Relation between Sensitivity and Accuracy in In-context Learning”.
Yanda Chen
@yanda_chen_
Jul 18, 2023
[1/9] Large Language Models (LLMs) can mimic humans to explain human decisions. But can they explain THEMSELVEs? How to evaluate explanations along this axis? Check out our work “Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations”!
41K
Yanda Chen
@yanda_chen_
Feb 21, 2024
[1/7] Pre-trained LMs can do in-context learning, but this is unexpected given the distribution shift between pre-training data and ICL prompts. What structures of pre-training data yield ICL? Check out our work “Parallel Structures in Pre-training Data Yield In-Context Learning”
35K
Yanda Chen
@yanda_chen_
Oct 15, 2021
[1/8] Large LMs (e.g.,GPT-3) are good at few-shot learning. But prompting LMs exhibit artifacts like oversensitivity to example order/choice & instruction wording.☹️ Our work “Meta-learning via LM In-context Tuning” proposes a fix to meta-train LMs to learn in-context learning!😃
Yanda Chen
@yanda_chen_
Aug 12, 2024
Come check out our ACL poster today at Session B (14:00-15:30). We study how in-context learning emerges, and find it emerges from parallel structures in the pre-training data.
Yanda Chen
@yanda_chen_
Feb 21, 2024
[1/7] Pre-trained LMs can do in-context learning, but this is unexpected given the distribution shift between pre-training data and ICL prompts. What structures of pre-training data yield ICL? Check out our work “Parallel Structures in Pre-training Data Yield In-Context Learning”
6.3K
Yanda Chen
@yanda_chen_
Jul 19, 2024
I'll be at ICML and presenting our spotlight paper, "Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations," on Wednesday, 7/24 11:30 AM - 1:00 PM CEST. Please check it out! Looking forward to discussing explainability, LLMs, alignment, etc.
Yanda Chen
@yanda_chen_
Jul 18, 2023
[1/9] Large Language Models (LLMs) can mimic humans to explain human decisions. But can they explain THEMSELVEs? How to evaluate explanations along this axis? Check out our work “Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations”!
4K
Yanda Chen
@yanda_chen_
Feb 21, 2024
--Repost due to accidental deletion-- [1/n] Large Language Models can generate fluent explanations, but their explanations are often INCONSISTENT on different inputs. How to improve their consistency? Check out our work “Towards Consistent Natural-Language Explanations via
1.2K
Yanda Chen
@yanda_chen_
Aug 3, 2021
Check out our #ACL2021NLP paper "Cross-language Sentence Selection via Data Augmentation and Rationale Training" in Session 11D at 12-1pm (EDT) Tuesday Aug 3, 2021! We present an effective approach for cross-language sentence retrieval in low-resource settings.
Yanda Chen
@yanda_chen_
Sep 16, 2022
Replying to @yanda_chen_
[5/6] Experiments show that sensitivity is a strong signal for selective prediction, as SenSel consistently outperforms the MaxProb baseline by up to 5.1 AUC pts. We argue that ICL sensitivity is not merely an isolated artifact, but reflects how confidently the LM learns a task.
Yanda Chen
@yanda_chen_
Feb 21, 2024
Replying to @yanda_chen_
[7/7] Paper: arxiv.org/abs/2402.12530 Coauthors: @henryzhao4321, @Zhou_Yu_AI, Kathleen McKeown, @hhexiy
arxiv.org
Parallel Structures in Pre-training Data Yield In-Context Learning
Pre-trained language models (LMs) are capable of in-context learning (ICL): they can adapt to a task with only a few examples given in the prompt without any parameter update. However, it is...
538
Yanda Chen
@yanda_chen_
Jul 18, 2023
Replying to @yanda_chen_
[3/9] We propose to evaluate the counterfactual simulatability of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input.
752
Yanda Chen
@yanda_chen_
Oct 15, 2021
Replying to @yanda_chen_
[4/8] Compared to MAML and multi-task fine-tuning which adapt to new tasks by gradient descent on task examples, LM weights are frozen during task adaptation in in-context tuning. Our approach gets rid of fine-tuning during adaptation and nested optimization of meta-training.