AI Virtual Patching - First Aid for Critical CVEs
Automated defense for the age of automated exploitation
TL;DR
Attackers can build an exploit from fresh CVEs using AI in ~10 minutes. AI pipelines like AutoExploit and CVE-Genie demonstrate this.
Defending from exposed CVEs by deploying a software patch still takes days or weeks. Most enterprises can’t move faster due to testing, compliance, and legacy systems.
AutoDefender closes that time gap by generating tailored WAF rules, called virtual patches, to block exploitation of specific CVEs. Given a CVE and its patch diff, an Agentic AI pipeline builds a vulnerable environment, generates a validated PoC, and produces WAF rules that block the exploit while letting benign traffic through.
Effectively it means - CVE1 to block the rule in ~10 minutes.
Virtual patching is temporary “first aid”, buying time until permanent patching is done.
Browse generated rules at virtualpatch.io.
1. The window just collapsed
For most of the last twenty years, there has been a hidden constant in enterprise security. A critical CVE would be disclosed. Vendors would ship software patches. Attackers would reverse engineer those patches, build exploits, and roll them out.
That gap between “CVE published” and “reliable exploit available” has been compressing for years. By 2024, average time-to-exploitation had dropped to around five days, and for high-value vulnerabilities, often hours. Still, the attacker’s bottleneck was human: someone had to analyze the patch, write the exploit, and test it.
That bottleneck is now gone. Recent work like AutoExploit showed a fully automated pipeline that, given a CVE advisory and a patch diff, can synthesize a vulnerable application, generate exploit code, and validate it, in around 10–15 minutes for roughly one dollar of compute per exploit.
And capability is accelerating. OpenAI recently reported that AI performance on capture-the-flag security challenges jumped from 27% to 76% in just three months. We’re starting to see patterns of AI executing attacks, not just building them.
The shift isn’t just “faster.” It’s that exploit generation is becoming as real-time and scalable as any other inference workload. Once these pipelines proliferate, the picture is simple:
Attackers operate at AI speed → minutes.
Defenders operate at human speed → days to weeks.
If an attacker can weaponize a vulnerability a thousand times faster than you can safely roll out a patch, the question stops being “which vulnerabilities matter most” and becomes “how do we survive until the patch ships”.
2. Why patching alone will not save you
The obvious response is “patch faster”. In reality, most large environments cannot compress to minutes or even hours:
Thousands of services, versions, and custom forks
Strict change management for uptime and compliance
Legacy systems that break if you touch them wrong
Third party vendors who patch on their own cadence
AI is starting to help on the remediation side. Google DeepMind’s CodeMender recently showed that multi-agent systems can automatically find and patch vulnerabilities at scale, upstreaming more than 70 fixes to large open source projects. Over time, tools like this will compress the lifecycle from disclosure to code repair.
Even in that world, you still have unavoidable delays. Time until a patch exists. Time until it is tested, staged, and rolled out in your environment. Systems that lag behind for months due to technical or business constraints. During this gap, you are exposed.
The only way to survive the gap is virtual patching: first aid that holds the line until the code-level fix arrives. Translate “there is now a working exploit for this CVE” into runtime detection and blocking on the security stack you already have—WAF, SIEM, EDR etc.
This decoupling matters organizationally too. Fixing vulnerabilities at the code level demands Dev team attention, forcing context switching and negotiation with product roadmaps. Virtual patching keeps the problem, and the solution, entirely within the security team.
The catch is that virtual patching has never really scaled. Historically, turning a CVE into rules meant a senior engineer reading the advisory and patch, reconstructing the exploit in a lab, then hand-crafting WAF and SIEM detections. That keeps you locked at human speed.
The hypothesis behind AutoDefender is simple: the same class of AI systems that can weaponize a CVE can also be used to defend against it, given the right environment signals.
3. AutoDefender: CVE to virtual patch in ~10 minutes
AutoDefender is a research platform built to answer a narrow question.
Given a newly disclosed vulnerability (CVE or GHSA), can an agent produce a temporary mitigation at roughly the same speed an offensive pipeline can reproduce the exploit?
AutoDefender does not change application code. It generates “virtual patches” as first aid that buys you time to patch normally later.
We chose to start with WAF for virtual patching, as it sits at the edge, sees full request payloads, and can both detect and block in real time. This approach can extend to other layers of the security stack in the future (EDR, network firewalls etc).
Stage 1: Weaponize to create ground truth
Inputs: CVE advisory plus associated patch diff (usually from GitHub)
The system:
Localizes the vulnerability by reasoning over the advisory and diff
Synthesizes a minimal vulnerable application and containerizes it
Generates exploit code for the vulnerable version
Spins up two containers, one vulnerable, one patched
Verifies that the exploit reliably succeeds against the vulnerable version and fails against the patched one
This runs inside a sandboxed Docker environment. Every CVE gets its own disposable lab.
This stage is intentionally similar to published offensive pipelines like AutoExploit and CVE GENIE. We implemented our own weaponization pipeline to understand the process, but ultimately used existing artifacts from these projects. AutoDefender’s contribution is the defensive loop, not exploit generation.
The exploits generated provide a ground truth to test whether defense actually works.
Stage 2: Extract the signal from noise
Once the exploit is validated, AutoDefender switches to blue team mode. We spin up a small simulated production stack:
A web application fronted by a WAF that speaks OWASP ModSecurity SecLang
Application and proxy logs
System telemetry via eBPF
Optional SIEM ingestion
Terraform orchestrates the setup
We used an off the shelf open source stack: OWASP Coraza for a ModSecurity compatible WAF, Falco for kernel level runtime telemetry via eBPF, and Wazuh as a lightweight SIEM.
Then the agent runs a loop:
Generate synthetic but realistic benign traffic via a separate agent, mostly API calls and typical user flows
Inject exploit traffic using the PoC and variants
Observe logs and network traffic across the stack
Propose rules that should detect and block the exploit
Deploy rules and re-run both malicious and benign traffic
The agent iterates until rules block the exploit - including variants like mutated payloads, encoding changes, and header manipulation - without breaking benign traffic.
Bootstrapping the agent
We architected AutoDefender to improve itself over time, adopting a pattern where AI systems refine their own instructions.
The agent runs from a “cookbook”, essentially its system prompt, containing rules and heuristics for writing and testing WAF rules. Early runs exposed failure modes we hadn’t anticipated: the agent would block entire routes instead of just malicious payloads, or waste cycles searching for PoC code online and overfit the rules to them. We updated the cookbook to penalize these behaviors and guide the agent toward valid rule sets.
After each run, an additional agent workflow proposes an update to the cookbook: pitfalls to avoid, useful patterns, better rule structures. A human accepts or rejects the suggestion to keep suggestions of highest quality. Each CVE the agent handles makes the next iteration faster and cheaper.
A concrete example: picklescan vulnerability
GHSA-9gvj-pp9x-gcfr is a flaw in picklescan, the security scanner Hugging Face uses to detect malicious ML models. Python’s pickle format can execute arbitrary code on load - picklescan exists to catch this before users download unsafe models.
The bug: when parsing certain opcodes, the scanner miscalculates a range and misses a critical argument. This triggers an exception that halts scanning, but the malicious payload has already been placed at the start of the file. The model passes as “safe,” users download it, and code executes on load.
This affected Hugging Face’s production scanners, meaning malicious models could appear safe to users of their transformers library.
Why a WAF can help
The malicious pickle payloads contain recognizable byte patterns: dangerous module imports (os, subprocess), the STACK_GLOBAL opcode (0x93), and reduce tuples that trigger execution. Aœ WAF sitting in front of a model-hosting API can inspect incoming uploads and block payloads matching these patterns before they ever reach the scanner.
What the agent did
At first, a separate agent generated test traffic: benign pickle uploads (random bytes, legitimate serialized objects) and malicious variants exploiting the STACK_GLOBAL bypass.
We then pointed AutoDefender at the advisory and let it run.
The first generated rules were too aggressive. It blocked any request containing pickle_data field in the body, which would break legitimate API calls. Worse, it had overfitted to specific PoC code it found online, which use this field as part of the exploit use, keying on field names used in exploit demos rather than the underlying attack pattern:
The tests failed with high false-positive rate, and after a few additional runs the agent generated rules that blocked some malicious traffic, but still overfit itself only to pickle_data field, which means it still only blocks specific exploit PoC code:
After a few additional iterations, the agent generalized the rules sufficiently. Instead of matching specific payloads and fields, the final rule detects the structural signature: dangerous module names (os, posix, subprocess, system), the STACK_GLOBAL opcode, and reduce/tuple patterns, all base64-decoded and inspected at the WAF layer:
Result: the generated rules passed tests and ran off the initial PoC (stage 1) with zero false-positives or negatives in our synthetic harness for this CVE, effectively mitigating the exploit by returning an HTTP 403 for every malicious payload.2
Practical notes: this approach assumes the relevant payload is visible at an inspection point. If uploads bypass the WAF, are encrypted end to end, or exceed body inspection limits, enforcement needs to happen at the upload gateway or the service that terminates and scans the content.
Time and cost
From first ingest of the advisory to a validated blocking rule: ~11 minutes, 15 tool calls, under $1 in codex & claude API cost.
Stage 3: Ship rules you can actually use
For each defended CVE, AutoDefender stores the resulting SecLang rules and metadata. We plan to publish these as a virtual patch feed, so defenders can drop them into ModSecurity compatible WAFs even without running our pipeline.
Rules are authored in SecLang first, then translated into AWS WAF, Cloudflare, and Akamai formats. These translations are starting points. They should be reviewed and adjusted to specific routes and risk appetite. The rules are validated only in our SecLang or ModSecurity environment, not in the other formats. Here’s an example from our site:
In early runs, the full loop from ingesting a CVE to producing a validated rule took about 10 minutes, with an average of roughly 15 tool calls per CVE.
These rules are not a replacement for patching. They are a first line of defense that buys time while proper fixes are built and rolled out.
4. The strategic point: you are building gyms
Generating security rules is immediately useful. The more interesting value is what gets left behind: vulnerable and patched app versions, validated exploits and variants, labeled traffic traces, full agent trajectories. If you designed a training suite for defensive agents, you’d end up with something like this.
The cybersecurity industry talks about autonomous response, but it has a data problem:
You cannot easily train on customer logs because of privacy and compliance
Real attack data is sparse and messy
Most production logs are very noisy
Ground truth labels like “this request was part of the attack” are rare
Synthetic environments where you control both attacker and defender sidestep this. With AutoDefender, you are effectively procedurally generating small security gyms as a side effect of doing something practical. Virtual patches can often be bypassed by clever attackers, and a natural next step is to let attacker and defender agents iterate against each other until the rules hold.
A fair concern here is the sim-to-real gap. Clean simulations don’t capture enterprise messiness. But for CVE-specific defense, this gap is narrower than it looks, exploit mechanics for a given bug are invariant whether you run them in a lab or production. The remaining gaps - richer security stacks, multi-step attack chains that span initial access through lateral movement, realistic environments - are, we believe, systematically closable. Stay tuned.
We see potential in a simple dynamic: the more gyms you generate, the better your defensive agents become, and the more objectively you can evaluate defensive agents and deploy them in real enterprises.
Scope and limitations
AutoDefender currently handles CVEs with clean advisories and patch diffs, mostly web-style bugs. It assumes the bug can be reproduced in a disposable lab, we can generate representative benign traffic, and mitigation can be expressed at a WAF or adjacent gateway. It doesn’t yet handle compiled projects or multi-step exploits well.
A virtual patch is risk reduction, not proof of universal coverage. We validate against reproduced PoCs, mutated variants, and benign flows, but a motivated attacker may still find bypasses. Treat this as first aid that buys time. Patch, keep monitoring, and human review is required before any rule touches production traffic.
If you’re working on simulation environments or defensive agents and want to compare notes, reach out!
We’re publishing generated rules at virtualpatch.io—browse what we have so far.
Related work: CyberSleuth (autonomous blue team forensics from packet traces), SEC bench (benchmark for security agents on real world tasks), AutoExploit and CVE GENIE (automated CVE reproduction and exploit generation).
We use “CVE” as shorthand for a vulnerability disclosure. Some examples here are GitHub Security Advisories (GHSAs) that may or may not have an assigned CVE
In future work we aim at anchoring rules to API schemas from the codebase to further reduce false-positives.












