Cas (Stephen Casper) (@StephenLCasper) / X

Cas (Stephen Casper)

2,543 posts

Cas (Stephen Casper)

@StephenLCasper

Computer scientist working on AI safeguards and gov research. Incoming assistant professor @Kennedy_School @Harvard. stephencasper.com

Cambridge, Massachusetts, USA

Joined March 2016

Pinned
Cas (Stephen Casper)
@StephenLCasper
Mar 24
I'm extremely excited to be on the organizing committee this year for my favorite workshop ever! Submissions (up to 8 pages) are due April 24! Co-submission with ICML and NeurIPS is encouraged!
Technical AI Governance @ ICML 2026
@taig_icml
Mar 24
🚨📢Announcing the second Technical AI Governance Research (TAIGR) workshop @icmlconf. Accepting submissions (up to 8 pages) until April 24 on technical topics in AI governance! #icml2026
taigr-workshop.com
TAIGR @ ICML 2026 — Workshop on Technical AI Governance Research
Second Workshop on Technical AI Governance Research at ICML 2026. Bridging ML researchers and policymakers in Seoul, South Korea.
15K
Cas (Stephen Casper)
@StephenLCasper
May 13, 2023
Thread: [1/4] Some MIT/Harvard collaborators and I just finished a project to show that Stable Diffusion objectively succeeds at copying the styles of digital artists with copyrighted work. Why might you care about this if you care about AI safety?
GitHub - thestephencasper/sd_cycle_consistency
From github.com
325K
Cas (Stephen Casper)
@StephenLCasper
Jul 31, 2023
New paper: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback We survey over 250 papers to review challenges with RLHF with a focus on large language models. Highlights in thread 🧵
200K
Cas (Stephen Casper)
@StephenLCasper
Aug 22, 2025
A personal update: - I just finished my 6-month residency at @AISecurityInst. - I'm going back to MIT for the final year of my PhD. - I'm on the postdoc and faculty job markets this fall!
33K
Cas (Stephen Casper)
@StephenLCasper
Feb 10, 2025
Imagine if the 2015 Paris Climate Summit was renamed the "Energy Action Summit," invited leaders from across the fossil fuel industry, raised millions for fossil fuels, ignored IPCC reports, and produced an agreement that didn't even mention climate change. #AIActionSummit 🤦
76K
Cas (Stephen Casper)
@StephenLCasper
Mar 12, 2025
🚨New paper led by @aribak02 Lots of prior research has assumed that LLMs have stable preferences, align with coherent principles, or can be steered to represent specific worldviews. No ❌, no ❌, and definitely no ❌. We need to be careful not to anthropomorphize LLMs too much.
107K
Cas (Stephen Casper)
@StephenLCasper
Apr 24, 2025
Hi, I’m at ICLR — let’s talk about AI evals, safeguards, and technical governance.
19K
Cas (Stephen Casper)
@StephenLCasper
Aug 5, 2025
OpenAI just claimed to introduce "malicious fine-tuning"... In this thread, I'll give a list of academic works on tampering attacks from the past few years that I think they didn't credit or take into account.
47K
Cas (Stephen Casper)
@StephenLCasper
Jan 29, 2024
🚨New paper🚨 Black-Box Access is Insufficient for Rigorous AI Audits AI audits are increasingly seen as key for governing powerful AI systems. But to be effective, audits need to be high-quality, and to produce high-quality audits, auditors need access.🧵 arxiv.org/abs/2401.14446
115K
Cas (Stephen Casper)
@StephenLCasper
May 13, 2023
Replying to @StephenLCasper
[5/5] These experiments are of limited direct relevance to the lawsuit, but they help establish that digital artists are objectively, successfully copied by diffusion models and strengthen the case of tangible harm caused by these models. stablediffusionlitigation.com
14K
Cas (Stephen Casper)
@StephenLCasper
May 5, 2024
Sometime in the next few months, @AnthropicAI is expected to release a research report/paper on sparse autoencoders. Before this happens, I want to make some predictions about what it will accomplish. Overall, I think that the Anthropic SAE paper, when it comes out, will
124K
Cas (Stephen Casper)
@StephenLCasper
Apr 5, 2024
I think that this is a really cool and unique paper. It introduces the idea that AI could significantly reduce memetic diversity in the world. arxiv.org/abs/2404.03502
27K
Cas (Stephen Casper)
@StephenLCasper
May 13, 2023
Replying to @StephenLCasper
[2/5] Right now, some large-scale AI training runs have been made easier by a lack of concrete protections against training on copyrighted work. But some companies who have released diffusion models are being class-action sued on behalf of artists for copyright violations.
12K
Cas (Stephen Casper)
@StephenLCasper
May 13, 2023
Replying to @StephenLCasper
[3/5] The success of this lawsuit could make huge training runs more difficult/expensive and raise the activation energy to develop, deploy, and capitalize on advanced AI. This is helpful from the perspective of slowing down AI.
The case for slowing down AI
From vox.com
11K