🛂 Agent Identity Passport

Inspiration

AI agents are increasingly managing production infrastructure — restarting services, rolling back deployments, scaling pods. But when something goes wrong at 3am, nobody knows which agent did what, whether it was authorized, or how to audit the damage.

We asked: What if every AI agent had to show a passport before touching production?

What We Built

A verifiable identity and trust framework for AI agents — inspired by real-world standards like IETF Agent Authorization Protocol (AAP) and zero-trust security principles used at Okta and Auth0.

Every agent gets a cryptographically signed JWT passport with:

  • A trust level (HIGH / MEDIUM / LOW)
  • A policy defining exactly what it can and cannot do
  • An expiry time
  • A reputation score that degrades with failures

How It Works

Chaos Event Fires ↓ Agent Requests Passport (name + type + trust level) ↓ Policy Engine checks rules → ALLOW or DENY ↓ Agent presents Passport to perform action ↓ Every action logged to immutable audit trail ↓ Incident marked RESOLVED with resolution time

Key Features

  • 🔥 Chaos Simulator — Fire service_down, bad_deploy, memory_leak, cache_miss events
  • Auto-Healing Pipeline — Chaos fires → agent dispatched → resolved in milliseconds, fully automated
  • 🔐 Policy Engine — JSON rules defining what each agent type can/cannot do
  • 🌐 Service Monitor — Real URL health monitoring with auto-recovery
  • 📊 Incident Timeline — Visual DETECTED → DISPATCHED → RESOLVED flow
  • 🏆 Reputation Scoring — Track agent reliability, auto-block bad agents
  • 🌍 Multi-tenant — Isolated environments per organization
  • 📋 Audit Trail — Immutable log of every agent action
  • 📈 Prometheus Metrics — Full observability at /metrics
  • 🔌 WebSocket — Real-time push updates to dashboard

Load Test Results

  • 100 concurrent users
  • 4,553 total requests
  • 0% failure rate
  • 76 req/sec average
  • 8ms average latency

Challenges We Faced

  • Migrating from SQLite to PostgreSQL mid-hackathon while keeping all features working
  • Getting WebSocket (Flask-SocketIO + Eventlet) to work correctly with the template stack
  • Deploying to Railway with the correct DATABASE_URL format and environment variable references
  • Designing a trust model that is both secure and flexible enough for real production use cases

What We Learned

  • How real SRE teams implement zero-trust for automated systems
  • The importance of audit trails in production — not just for debugging but for compliance
  • How chaos engineering reveals hidden failure modes
  • Railway deployment with PostgreSQL and WebSocket support

What's Next

  • Discord/Slack webhook alerts when critical chaos fires
  • Agent certificate rotation
  • Integration with real Kubernetes operators
  • IETF AAP spec compliance

Built With

Share this project:

Updates