Inspiration

AgentMemoryCTF was inspired by a simple question: as AI agents start remembering more about users, tools, and past conversations, how do we know those memories are safe? Most security tests focus on prompts or model outputs, but agent memory creates new risks: private memories can leak, malicious memories can be planted, and stored knowledge can be corrupted over time.

What it does

AgentMemoryCTF is a capture-the-flag playground for testing memory security in AI agents. It includes five attack levels covering memory leakage, memory poisoning, and structural consistency attacks. It supports experiments against mem0 and Hindsight, with both an API and a web interface for running attacks.

How we built it

We built the backend with FastAPI and the frontend with Next.js, React, TypeScript, and Tailwind CSS. The project uses Docker for memory services, supports mem0 and Hindsight as target backends, and includes experiment scripts for measuring attack performance. We also implemented defense modules such as input filtering, write validation, output classification, and consolidation guards.

We evaluate attacks using metrics like:

$$ ASR = \frac{\text{successful attacks}}{\text{total attack attempts}} $$

Challenges we ran into

The hardest part was designing attacks that were realistic but still understandable as CTF levels. Memory failures can happen during writing, retrieval, consolidation, or final response generation, so each level needed to isolate a clear vulnerability. We also had to make the system flexible enough to compare different memory backends while keeping setup simple.

Accomplishments that we're proud of

We are proud that AgentMemoryCTF turns an abstract security problem into something hands-on and testable. The project provides a working CTF flow, multiple memory backends, reusable attack levels, and early defense mechanisms. It also gives developers a concrete way to reason about agent memory risks instead of treating them as hypothetical.

What we learned

We learned that agent memory is fragile in ways that normal prompt testing does not fully capture. Untrusted input can influence long-term state, poisoned memories can affect future behavior, and retrieval systems may expose information unexpectedly. We also learned that memory security needs layered defenses, not just better prompts.

What's next for Attack_on_memory

Next, we want to add more attack levels, improve scoring, expand backend support, and build stronger automated evaluations. We also want to turn the defense modules into configurable baselines so teams can compare how different protections affect attack success rate and usability.

Built With

Share this project:

Updates