I was laid off by Meta today. As a Research Scientist, my work was just cited by the legendary @johnschulman2 and Nicholas Carlini yesterday.
I’m actively looking for new opportunities — please reach out if you have any openings!
Xianjun Yang
463 posts
- As a new grad and early-career researcher, I’m truly overwhelmed and grateful for the incredible support from the community. Within 24 hours, I’ve received hundreds of kind messages and job opportunities— a reminder of how warm and vibrant the AI community is. I’ll take time to
- 📢My New Paper: Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder TLDR: We proposed to use features from SAEs as a measure for data diversity&complexity and proved it's effectiveness on data selection for LLM tuning. arxiv.org/pdf/2502.14050
- New paper 🚨🚨🚨: arxiv.org/abs/2310.02949 Exciting (yet concerning) discoveries on new vulnerabilities of current LLMs: utilizing only 100 adversarial examples within 1 GPU hour can subvert safely aligned models to adapt to harmful tasks without sacrificing model helpfulness.
- 📢[New paper📚] Detection of LLMs-Generated Content We give the first comprehensive survey on the methods, datasets, attacks, challenges and outlooks about detection in the era of LLMs. w/@PanLiangming, @xuandongzhao, @WilliamWangNLP & etc. Paper link:
- It's worth noting that most AI detectors can only return a probability of whether the text is AI-generated. But a high probability alone can not serve as practical evidence. Luckily, our previous work published at ICLR 2024 can provide strong text-level EVIDENCE to support theICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangram! It seems that ~21% of reviews may be AI?
- Real LLM test: It's tax season again! I had 3 employers last year due to internship, and relocated 4 times in 2 states. Only chatgpt4o gives me correct answers. Calude-3.7 and Grok-3 both failed at calculating STCG. #Claude37Sonnet #Grok3
- 🚀 New Paper Alert! 🚀 Our latest paper introduces a novel weak-to-strong jailbreaking attack on Large Language Models Paper: arxiv.org/abs/2401.17256 Joint with @xuandongzhao @TianyuPang1 @duchao0726 @lileics @yuxiangw_cs @WilliamWangNLP🚀 New Research Alert! Our latest paper introduces a new method to test the robustness of LLMs against jailbreaking attacks. Discover the "Weak-to-Strong Jailbreaking on Large Language Models". Paper: arxiv.org/abs/2401.17256 Code: github.com/XuandongZhao/w…
- Replying to @WenhuChenIf I were a PhD student in China, I prefer Mexico. Traveling to a new country is great experience than staying at the same city lol
- 📢New Paper📢 Happy to introduce our new work on whether multimodal LLMs achieved PhD-level intelligence across diverse scientific disciplines! It turns out that the most advanced MLLMs still lag behind a lot! #AI4Science🚨 Introducing “MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension” arxiv.org/abs/2407.04903 🧐Have current multimodal LLMs achieved PhD-level intelligence across diverse scientific disciplines? Are they ready to become AI scientific assistants?
- Our previous work (Weak-to-Strong Jailbreaking: arxiv.org/abs/2401.17256) also found that "LLM safety alignment is only a few tokens deep." We focus on utilizing this vulnerability to attack a strong model. Happy to see another work shares the same finding but focuses on defense.Our recent paper shows: 1. Crrent LLM safety alignment is only a few tokens deep. 2. Deepening the safety alignment can make it more robust against multiple jailbreak attacks. 3. Protecting initial token positions can make the alignment more robust against fine-tuning attacks.arxiv.orgWeak-to-Strong Jailbreaking on Large Language ModelsLarge language models (LLMs) are vulnerable to jailbreak attacks - resulting in harmful, unethical, or biased text generations. However, existing jailbreaking methods are computationally costly....
- I spent one hour reading this. As an AI safety researcher, here is my comment: A good sci-fi but not serious scientific prediction."How, exactly, could AI take over by 2027?" Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @slatestarcodex, @eli_lifland, and @thlarsen
















