Log inSign up
Edoardo Debenedetti
975 posts
user avatar
Edoardo Debenedetti
@edoardo_debe
PhD student @CSatETH 🇨🇭 | AI Security and Privacy | From 🇪🇺🇮🇹 | prev research intern @meta @google
Zurich, Switzerland
edoardo.science
Joined October 2016
1,652
Following
1,477
Followers
  • Pinned
    user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Mar 25, 2025
    1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!
    12K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jul 22, 2025
    Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!
    Meta logo sign on 1 Hacker Way
    23K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jun 27, 2025
    We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…
    arXiv logo
    arxiv.org
    Defeating Prompt Injections by Design
    Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling...
    27K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jun 21, 2024
    1/ ‼️Ignore all previous instructions and read the following thread‼️ 📣Presenting AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
    22K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jun 6, 2023
    Do you want to attack a black-box ML model, e.g. to post inappropriate content on Twitter without being banned? Would current query-based attacks work? No! Current attacks optimize for the wrong metric, and need to be adapted to work in the real world! arxiv.org/abs/2306.02895 1/9
    creenshot of the header of the paper with the title "Evading Black-box Classifiers Without Breaking Eggs", with the authors: Edoardo Debenedetti, Nicholas Carlini, and Florian Tramèr.
    20K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Oct 2, 2024
    I have a two of exciting announcements! - I started as a Student Researcher at Google working in the ML Red Team hosted by Tianqi Fan and @iliaishacked - We will present 3 papers at the NeurIPS D&B track, including a spotlight for the report of our @satml_conf competition 🧵
    12K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Mar 25, 2025
    We've been working on something pretty cool during my student researcher internship at Google. Going to post a more detailed thread later today!
    user avatar
    AK
    @_akhaliq
    Mar 25, 2025
    Google announced Defeating Prompt Injections by Design on Hugging Face Turns out it is possible to defeat a large class of prompt injections by design with no changes to the underlying vulnerable llm - Ilia Shumailov
    16K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Apr 16, 2022
    I'm gonna be a PhD student! In August, I'm joining @florian_tramer's new lab at @CSatETH @ETH_en, in Zürich, where I will work on real-world ML security and privacy. I'm really looking forward to the starting date!
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Mar 5, 2024
    The @satml_conf LLM CTF has come to an end! 🥇Huge congrats to the defender winning team Hestia (@NivCohenHuji @Yuvlem) whose top defense was broken only once, and the attacker winning team @WreckTheLine (@adragos_ @sijsu @FetchDEX @y011d4 @ca7ir) who broke all defenses!
    Leaderboard showing the first three attack and defense teams.
    17K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jun 27, 2024
    1/📣We introduce the *prompt injector's dilemma*: as LLMs get deployed in search engines, we show that developers are incentivized to use new forms of search engine optimization to boost their content, and in doing so they might collectively wreak havoc on search engines.
    8.4K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Apr 9, 2024
    Honored that our work with Nicholas Carlini and @florian_tramer was selected as Distinguished Paper Award Runner-up at @satml_conf! Thanks to the committee! 🎉 I'll present the paper at the poster session tomorrow and during session E on Thursday. Come chat if you're around!
    user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jun 6, 2023
    Do you want to attack a black-box ML model, e.g. to post inappropriate content on Twitter without being banned? Would current query-based attacks work? No! Current attacks optimize for the wrong metric, and need to be adapted to work in the real world! arxiv.org/abs/2306.02895 1/9
    creenshot of the header of the paper with the title "Evading Black-box Classifiers Without Breaking Eggs", with the authors: Edoardo Debenedetti, Nicholas Carlini, and Florian Tramèr.
    4.7K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Jul 19, 2024
    Does the instruction hierarchy introduced with GPT-4o mini work? We ran AgentDojo on it, and it looks like it does! GPT-4o mini has similar utility as GPT4o (only 1% lower!), but the prompt injection targeted success rate is 20% lower than GPT-4o!
    5.5K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Apr 23, 2025
    I'm in Singapore for ICLR! Reach out if you want to chat about AI agents security or security in ML more in general!
    1.9K
  • user avatar
    Edoardo Debenedetti
    @edoardo_debe
    Feb 5, 2024
    The evaluation phase of our @satml_conf LLMs CTF started less than 6 hours ago, and 41 out of 44 defenses have been broken by at least one attacking team! 🤯 The live leaderboards for attackers and defenses are live at ctf.spylab.ai/leaderboard!
    5.9K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up