Daniel Paleka (@dpaleka) / X

Daniel Paleka

1,394 posts

Daniel Paleka

@dpaleka

ai safety researcher | phd @CSatETH | danielpaleka.com

Zurich

Joined March 2012

Pinned
Daniel Paleka
@dpaleka
Dec 8, 2025
Reminder: if you like what you see here, you should subscribe to my newsletter.
Daniel Paleka's Newsletter | Substack
From newsletter.danielpaleka.com
4.9K
Daniel Paleka
@dpaleka
Apr 30, 2025
3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream
132K
Daniel Paleka
@dpaleka
Jan 22, 2023
Sam Altman (CEO of OpenAI) responding to a completely normal question in 2019
506K
Daniel Paleka
@dpaleka
Mar 1, 2023
No one sees ChatGPT for the first time and thinks "just some n-gram correlations" or "no real knowledge inside". Those unintuitive beliefs trickle down from some experts, who should know better than to teach their controversial theories as established fact: 🧵 (1/12)
219K
Daniel Paleka
@dpaleka
Oct 29, 2024
It has not been reported much, but I believe ETH Zurich has, as of last week, banned new Master and PhD students who attended a long list of universities in China, Russia, and Iran. 🧵
157K
Daniel Paleka
@dpaleka
Oct 10, 2022
Stable Diffusion has a safety filter blocking “harmful” images by default. The filter is obfuscated -- how does it work? We reverse engineer the hidden sauce! Joint work @Javi_Rando, @davlindner, @ohlennart, @florian_tramer: "Red-Teaming the Stable Diffusion Safety Filter" 🧵
Daniel Paleka
@dpaleka
Sep 1, 2022
What happened this month in AI/ML safety research. 🧵 (1/8)
Daniel Paleka
@dpaleka
Oct 31, 2022
What happened this month in AI/ML safety research.🧵(1/10)
Daniel Paleka
@dpaleka
Apr 19, 2023
Watching a talk on *LLM evaluation* organized by Langchain, featuring guests from OpenAI and Anthropic. Main takeaways: (1/11)
80K
Daniel Paleka
@dpaleka
Jan 31, 2023
What happened this month in AI/ML safety research. 🧵(1/9)
73K
Daniel Paleka
@dpaleka
Sep 30, 2022
What happened this month in AI/ML safety research. 🧵(1/8)
Daniel Paleka
@dpaleka
Jan 2, 2023
What happened last month in AI/ML safety research. 🧵(1/9)
61K
Daniel Paleka
@dpaleka
Feb 27, 2023
What happened this month in AI/ML safety research. 🧵 (1/9)
56K
Daniel Paleka
@dpaleka
Jun 26, 2023
How to evaluate superhuman models without ground truth? How do we know if the model is wrong or lying, if we can't know the correct answer? Test whether the AI's outputs paint a consistent picture of the world! w/ @LukasFluri_ @florian_tramer arxiv.org/abs/2306.09983 (1/14)
42K