user avatar
Ariel Herbert-Voss
@adversariel
Founder @RunSybil. likes: offsec, LLMs, and dumb memes. prev: research scientist @OpenAI / CS PhD @Harvard / @defcon AI Village
Joined September 2013
  • user avatar
    4chan mathematicians solved an interesting problem but nobody knows how to cite them. Amazing.
    A curious situation. The best known lower bound for the minimal length of superpermutations was proved by an anonymous user of a wiki mainly devoted to anime. mathsci.wikia.com/wiki/The_Haruh…
  • user avatar
  • user avatar
    Some news: I’m writing a book for @nostarch titled “The Machine Learning Red Team Manual”. My aim is to provide a practical guide for anyone interested in adversarial ML and red teaming as it relates to in-production ML systems. A short thread on why this project matters:
  • user avatar
    A lion in a hoodie hacking on a laptop - #dalle2
  • user avatar
    There’s a lot of fearmongering about LLMs being capable of finding 0day There are three highly complex roadblocks that need to be overcome for this to be a real concern: statefulness, hallucination, and contamination
    I just need to predict the next token I just need to predict the next token I just need to predict the next token
  • user avatar
    Oh good I can update my monitor stand
    It is here. General release is still scheduled for April 4.
  • user avatar
    Professional news - last month I joined @OpenAI where I am continuing my work on malicious uses of AI and red teaming AI systems :) I’m excited to be working on important problems with such talented people
  • user avatar
    My talk “Don't Red-Team AI Like a Chump” was accepted to @defcon so y’all get ready for some sweet knowledge about attacking ML systems at both the system and algorithm level
    GIF
  • user avatar
    the idea that you can just break into a data center and steal the model has a lot of memetic sticking power, but is stupid if you actually know anything about this topic. here's a thread on how confidential computing works in the NVIDIA H100:
  • user avatar
    Replying to @_delta_zero
    I used to think it was because as a quasiscientific community we highly value peer review but ML people are totally fine with citing papers on arxiv that haven't been accepted anywhere as long as you can replicate the results - I think perceived prestige nails it
  • user avatar
    If you’re curious about how (potentially sensitive) training data can be extracted out of large public language models like GPT2 then pls give our paper a read
  • user avatar
    made a guide to put parameter size into perspective
  • user avatar
    Sometimes you don’t need fancy math to break ML 😉
  • user avatar
    Check out this meme preview of my @defcon talk :) Come hear me speak in track 1 at 11 AM on Friday the 9th for both spicy takes and fresh AI/ML security advice