Argus — The Hundred-Eyed AI Guardian

An Autonomous Security Auditor Powered by Google Gemini

Team Phalanx ⚔️


Inspiration

In Greek mythology, Argus Panoptes was a giant with a hundred eyes—the ultimate watchman who never fully slept. As software supply chain attacks and code vulnerabilities continue to surge, we asked ourselves: What if AI could watch code with the same unwavering vigilance?

The inspiration struck during late-night debugging sessions where security flaws—SQL injections, command injections, hardcoded secrets—lurked hidden in seemingly innocent code. Traditional static analysis tools generate noise; they flag issues but leave developers drowning in CVE reports. We envisioned an AI-powered guardian that doesn't just detect threats—it neutralizes them.

Argus was born from a simple philosophy: Security should be autonomous, not advisory.


What it does

Argus is a fully autonomous AI security auditor powered by Google Gemini 2.5 Flash. It operates as a multi-agent system that:

  1. 🔍 Scans — Batch-analyzes Python codebases for critical vulnerabilities (SQL Injection, Command Injection, Path Traversal, SSRF, Hardcoded Secrets)

  2. 🛡️ Patches — Automatically generates secure code fixes using industry best practices (parameterized queries, subprocess with shell=False, environment variables for secrets)

  3. 🧪 Verifies — Creates and executes pytest-compatible reproduction tests to prove patches work before committing

  4. 🔄 Self-Evolves — When patches fail repeatedly, a Meta-Programming agent rewrites its own skill files to handle the failure pattern—the AI literally teaches itself to become a better security engineer

  5. 👁️ Shadow Daemon Mode — A real-time filesystem watchdog that continuously monitors code for threats as developers write

All wrapped in a premium cyberpunk dashboard with live threat intelligence feeds, deep reasoning visualization, cost tracking, and neural voice feedback using Edge TTS.


How we built it

Argus is structured as a hierarchical multi-agent framework:

┌─────────────────────────────────────────────────┐
│              ManagerAgent (Orchestrator)        │
│  ┌─────────────┬─────────────┬───────────────┐ │
│  │ ScannerAgent│ PatcherAgent│ ImproverAgent │ │
│  │  (Auditor)  │  (Surgeon)  │ (Meta-Brain)  │ │
│  └─────────────┴─────────────┴───────────────┘ │
│                 VerifierAgent                   │
│                 (Quality Gate)                  │
└─────────────────────────────────────────────────┘

Tech Stack

Component Technology
LLM Backend Gemini 2.5 Flash / 2.0 Flash with automatic model rotation
Core Language Python 3.10+
Dashboard UI Rich (Live tables, layouts, panels)
Voice Engine Edge TTS + Pygame
Filesystem Watcher Watchdog
Retry Logic Tenacity
Test Framework pytest

Skill System

Each agent reads Markdown "skill files" (audit_code.md, repair_code.md, generate_exploit.md) that define its behavior. This separation of concerns means the agents' capabilities can be upgraded without touching code—the ImproverAgent can even rewrite skills autonomously during operation.


Challenges we ran into

1. Rate Limit Hell

Gemini's API rate limits hit us hard during intensive multi-file scans. We implemented an automatic model rotation strategy that cascades through available models when errors occur:

$$ \text{gemini-2.5-flash} \xrightarrow{\text{fail}} \text{gemini-2.0-flash-exp} \xrightarrow{\text{fail}} \text{gemini-flash-latest} $$

This makes the system remarkably resilient to transient failures.

2. The Verification Paradox

Early versions would generate patches that looked correct but failed on edge cases. We solved this by requiring the AI to generate executable pytest tests for every patch. If the test fails, the patch is rejected and regenerated with error feedback—forming a closed-loop self-correction cycle:

$$ \text{Patch} \xrightarrow{\text{test}} \begin{cases} \text{Pass} \rightarrow \text{Apply} \ \text{Fail} \rightarrow \text{Regenerate}(+ \text{ErrorFeedback}) \end{cases} $$

3. Meta-Programming Stability

Letting an AI rewrite its own instructions is powerful but dangerous. We constrained the ImproverAgent to only add rules to existing skills, preserving format and structure. This prevents catastrophic drift while enabling incremental self-improvement.

4. Audio Threading Nightmares

The VibeEngine needed to speak asynchronously without blocking the main UI. We built a dedicated threaded worker with a queue.Queue that processes TTS requests in isolation, keeping the dashboard buttery smooth.


Accomplishments that we're proud of

  • 🧬 Self-Evolving AI — Argus rewrites its own neural pathways when it fails. This is meta-programming in action—an AI that debugs itself.

  • 💎 Zero False Acceptance — Every patch must pass automated verification tests. No fix goes live unless it's proven secure.

  • 🎮 Stunning UX — The cyberpunk terminal dashboard with live threat feeds, thinking logs, and cost estimation makes security auditing feel futuristic.

  • 🌐 Real-Time Guardian — Shadow Daemon mode transforms Argus from a tool into an omnipresent watchdog that protects code as it's written.

  • 💰 CFO-Friendly — Built-in token tracking and cost estimation so you always know the economics of your security scans:

$$ \text{Cost} = \frac{T_{\text{input}}}{10^6} \times \$0.075 + \frac{T_{\text{output}}}{10^6} \times \$0.30 $$

Where $T_{\text{input}}$ and $T_{\text{output}}$ represent input and output token counts respectively.


What we learned

  • Structured JSON outputs from LLMs are game-changers. Using response_mime_type="application/json" eliminated 90% of parsing headaches.

  • Multi-agent architectures beat monolithic prompts. Breaking the problem into $\text{Scanner} \rightarrow \text{Patcher} \rightarrow \text{Verifier} \rightarrow \text{Improver}$ made each agent hyper-focused and reliable.

  • Skills as Markdown files are powerful. Treating prompts as editable documents enabled rapid iteration and even autonomous improvement.

  • Voice feedback adds soul. The VibeEngine's "System Online" greeting turns a CLI tool into an experience. Users feel protected.


What's next for Argus

Phase Feature Description
🌍 Language Expansion Support for JavaScript, TypeScript, Go, and Rust
📊 Attack Graph Visualization Mermaid.js-powered exploit chain diagrams
🔗 CI/CD Integration GitHub Actions / GitLab CI pipeline plugins
☁️ Cloud Deployment FastAPI backend with real-time WebSocket dashboard
🤝 Human-in-the-Loop Approval workflows for high-risk patches
📜 Compliance Mapping OWASP Top 10 / CWE / CVE cross-referencing

*"With a hundred eyes, nothing escapes Argus."* **👁️ Argus — The Hundred-Eyed AI Guardian 👁️** **⚔️ Team Phalanx ⚔️** *Powered by Google Gemini*

Built With

Share this project:

Updates