Really excited to share our new preprint modelling the effectiveness and surveying the burden of 9 different non-pharmaceutical interventions against #COVID19 transmission, using data collected across 41 different countries.
Full Preprint: doi.org/10.1101/2020.0…
1/8
Do Bayesian Neural Networks need to be fully stochastic? In our ✨ AISTATS Oral 🌼, we answer with a resounding "no".
Partially stochastic networks are
- just as expressive
- just as principled
- and often better performing
than more costly fully stochastic networks
details👇🏽
we're setting up a new sangha in san francisco for those exploring the teachings of rob burbea, including samadhi, soulmaking, emptiness, and everything else rob offered 🔥
let me know if you'd like to come! share with friends! hope to see you there
i passed my phd viva today !!!
i give thanks to all of the (countless) beings that supported me and contributed to this <3
thank you @yeewhye@tom_rainforth @eric_nalisnick and so many others <3
i'll be running some MATS projects in the winter around adversarial robustness with @EthanJPerez
if you're interested in AI safety research, but looking for mentorship, i really strongly recommend MATS! feel free to DM me if you have questions :-)
come and help us improve adversarial robustness of frontier LLMs at @AnthropicAI
as LLMs become more capable, robustness issues will pose larger misuse risks, but as carlini says, the academic community has made "limited progress" so far
"Please learn from our mistakes. Don't do exactly the same things that we did, or you'll end up in ten years with having nothing to show for it." — Nicholas Carlini urging AI researchers to avoid the pitfalls of past adversarial ML research at the Vienna Alignment Workshop 2024.
New paper: we defend LLMs against universal jailbreaks across thousands of hours of red-teaming.
This work happened because of Anthropic's Responsible Scaling Program. We sat down, set an ambitious robustness goal, pivoted to get there, and then executed.
Read more below:
New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.
We’re releasing a paper along with a demo where we challenge you to jailbreak the system.
Funded ML PhD positions @ Oxford!
if anyone has questions about the AIMS programme including course structure, admissions etc, or any questions about Oxford in general, i'd be more than happy to help, just send me a message :)
really excited to release our paper on understanding sycophancy in language models 🎉 check out the thread for a good summary ✨
this work provides empirical evidence we'll need to go beyond using unaided non-expert human feedback to build reliable ai
AI assistants are trained to give responses that humans like. Our new paper shows that these systems frequently produce ‘sycophantic’ responses that appeal to users but are inaccurate. Our analysis suggests human feedback contributes to this behavior.
our work on jailbreak rapid response is out!
it offers an extremely pragmatic alternative to "achieve perfect robustness" that could mitigate real-world misuse
if you're interested in doing research on robustness and misuse, my team is hiring! DM me :-)
New research: Jailbreak Rapid Response.
Ensuring perfect jailbreak robustness is hard. We propose an alternative: adaptive techniques that rapidly block new classes of jailbreak as they’re detected.
Read our paper with @MATSprogram: arxiv.org/abs/2411.07494
Thrilled to share our latest work—Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe, out now in @NatureCommsnature.com/articles/s4146… 1/