Pinned
Jan Leike
790 posts
AI research @AnthropicAI. Previously OpenAI & DeepMind.
Optimizing for a post-AGI future where humanity flourishes.
Opinions aren't my employer's.
- I resigned
- I'm excited to join @AnthropicAI to continue the superalignment mission! My new team will work on scalable oversight, weak-to-strong generalization, and automated alignment research. If you're interested in joining, my dms are open.
- Replying to @janleikeTo all OpenAI employees, I want to say: Learn to feel the AGI. Act with the gravitas appropriate for what you're building. I believe you can "ship" the cultural change that's needed. I am counting on you. The world is counting on you. :openai-heart:
- We challenge you to break our new jailbreaking defense! There are 8 levels. Can you find a single jailbreak to beat them all? claude.ai/constitutional…
- Replying to @janleikeI joined because I thought OpenAI would be the best place in the world to do this research. However, I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.
- Replying to @janleikeBuilding smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.
- I think the OpenAI board should resign
- Replying to @janleikeI believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.
- Replying to @janleikeBut over the past years, safety culture and processes have taken a backseat to shiny products.
- Replying to @janleikeWe are long overdue in getting incredibly serious about the implications of AGI. We must prioritize preparing for them as best we can. Only then can we ensure AGI benefits all of humanity.
- With the InstructGPT paper we found that our models generalized to follow instructions in non-English even though we almost exclusively trained on English. We still don't know why. I wish someone would figure this out.

