Skip to content

arnavpant/Ouroboros

Repository files navigation

Ouroboros AI Resilience Platform

Autonomous AI Resilience System for Enterprise Agent Infrastructure

Status Phase Timeline


🎯 Overview

Ouroboros is an autonomous AI resilience platform that detects and remediates catastrophic failure modes in enterprise AI agent systems. It acts as an immune system for AI infrastructure, automatically detecting infinite loops, semantic drift, and runaway costsβ€”then healing them without human intervention.

The Problem: Multi-agent AI systems fail in unpredictable ways. A single trapped agent can burn $3,000+ per hour in API costs while grinding operations to a halt.

The Solution: Ouroboros combines deep observability (Datadog), generative AI (Google Vertex AI), and event-driven architecture (Confluent Kafka) to detect pathological behavior within 30 seconds and execute autonomous remediation.


✨ Key Features

  • πŸ” Autonomous Loop Detection: Detects infinite reasoning loops using semantic similarity analysis (95% threshold, 5 consecutive turns)
  • πŸ’‰ The Antidote: Automatically injects system instruction overrides to break loops
  • ⚑ Circuit Breaker: Suspends agents exceeding cost thresholds ($100 limit)
  • πŸ“Š Real-Time Observability: Full trace capture of agent reasoning with Datadog LLM Observability
  • 🎨 Neon Dashboard: Cyberpunk-themed Next.js dashboard with live remediation feed
  • πŸ”„ Event Streaming: Kafka-based audit trail for forensic analysis and replay
  • πŸ’° Cost Prevention: Prevents runaway API costs with token velocity monitoring

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    OUROBOROS ARCHITECTURE                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   Vertex AI β”‚      β”‚   Datadog    β”‚      β”‚  Confluent β”‚ β”‚
β”‚  β”‚ Agent Engine│─────▢│ Observability│─────▢│   Kafka    β”‚ β”‚
β”‚  β”‚  (Brain)    β”‚      β”‚  (Nervous    β”‚      β”‚ (Memory)   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚   System)    β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚        β”‚
β”‚         β”‚                     β”‚                    β”‚        β”‚
β”‚         β”‚                     β–Ό                    β”‚        β”‚
β”‚         β”‚             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚        β”‚
β”‚         β”‚             β”‚   Webhook    β”‚             β”‚        β”‚
β”‚         β”‚             β”‚   Triggers   β”‚             β”‚        β”‚
β”‚         β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚        β”‚
β”‚         β”‚                     β”‚                    β”‚        β”‚
β”‚         β–Ό                     β–Ό                    β–Ό        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚         Google Cloud Functions (Effector Arms)        β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚  β”‚
β”‚  β”‚  β”‚inject-antidoteβ”‚              β”‚circuit-breaker  β”‚   β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Philosophy: Tripartite organism pattern

  • Brain (Vertex AI): Agent reasoning and execution
  • Nervous System (Datadog): Observability and alerting
  • Memory (Kafka): Durable event log and audit trail

πŸš€ Quick Start

Prerequisites

  • Google Cloud Platform account with billing
  • Datadog account (14-day trial acceptable)
  • Confluent Cloud account (free tier)
  • Python 3.11+, Node.js 18+, gcloud CLI

Installation

# Clone the repository
cd /home/ugrads/majors/arnavpant27/oroboros

# Step 1: Create GCP project and enable billing
cd infrastructure/gcp
chmod +x create-project.sh
./create-project.sh

# Step 2: Load environment variables
source ../../config/gcp-project.env

# Step 3: Verify setup
gcloud projects describe $GCP_PROJECT_ID

Next Steps: See SETUP.md for complete installation guide.


πŸ“‹ Project Structure

oroboros/
β”œβ”€β”€ infrastructure/          # GCP and Terraform setup
β”‚   β”œβ”€β”€ gcp/
β”‚   β”‚   β”œβ”€β”€ create-project.sh        # Task 1.1 βœ…
β”‚   β”‚   β”œβ”€β”€ enable-apis.sh           # Task 1.2 (next)
β”‚   β”‚   └── service-accounts.sh      # Task 1.3 (next)
β”‚   └── terraform/                   # IaC configuration
β”œβ”€β”€ agents/                  # FinBot test agent
β”‚   └── finbot/
β”‚       β”œβ”€β”€ agent_config.py          # Vertex AI config
β”‚       β”œβ”€β”€ tools.py                 # Custom tools
β”‚       └── poison_prompts.py        # Test prompts
β”œβ”€β”€ observability/          # Datadog integration
β”‚   β”œβ”€β”€ datadog_tracer.py           # LLM tracing
β”‚   β”œβ”€β”€ semantic_analyzer.py        # Loop detection
β”‚   └── monitors/                   # Alert configs
β”œβ”€β”€ functions/              # Cloud Functions (remediation)
β”‚   β”œβ”€β”€ inject-antidote/            # The Antidote
β”‚   └── circuit-breaker/            # Agent suspension
β”œβ”€β”€ kafka/                  # Event streaming
β”‚   β”œβ”€β”€ schemas/                    # Avro schemas
β”‚   β”œβ”€β”€ producers/                  # Event publishers
β”‚   └── consumers/                  # Audit processors
β”œβ”€β”€ dashboard/              # Next.js frontend (neon theme)
β”‚   β”œβ”€β”€ app/                        # App Router pages
β”‚   β”œβ”€β”€ src/components/             # React components
β”‚   └── tailwind.config.ts          # Neon theme config
β”œβ”€β”€ api/                    # FastAPI backend
β”‚   β”œβ”€β”€ routers/                    # API endpoints
β”‚   └── services/                   # Business logic
β”œβ”€β”€ tests/                  # Test suite
β”‚   β”œβ”€β”€ unit/                       # Unit tests
β”‚   β”œβ”€β”€ integration/                # E2E tests
β”‚   └── load/                       # Load testing
β”œβ”€β”€ config/                 # Configuration files
β”‚   β”œβ”€β”€ gcp-project.env             # GCP settings βœ…
β”‚   └── .env.example                # Template
β”œβ”€β”€ docs/                   # Documentation
β”‚   └── SETUP.md                    # Setup guide βœ…
└── tasks/                  # Project management
    β”œβ”€β”€ prd-ouroboros-ai-resilience.md
    └── tasks-prd-ouroboros-ai-resilience.md

🎯 Success Metrics (Demo Day)

  • βœ… Detection Speed: <30 seconds from loop onset to detection
  • βœ… Remediation Success: 3/3 auto-heals during live demo
  • βœ… Cost Savings: Dashboard shows "$127 saved by auto-remediation"
  • βœ… Zero Human Intervention: Fully autonomous healing

πŸ› οΈ Technology Stack

Component Technology Purpose
AI Runtime Google Vertex AI Agent Engine Multi-agent orchestration
Observability Datadog LLM Observability Trace capture & alerting
Event Streaming Confluent Kafka Durable audit log
Remediation Google Cloud Functions Serverless auto-healing
Frontend Next.js 14 + React 18 Neon cyberpunk dashboard
Backend API FastAPI Metrics & agent data
Secrets Google Secret Manager Secure credential storage

πŸ“Š Current Status

Phase: 1 - Infrastructure Foundation (Hours 0-48)
Progress: Task 1.1 Complete βœ…

Phase Status Tasks Complete
Phase 1: Infrastructure 🟑 In Progress 1/12
Phase 2: Agent Development βšͺ Not Started 0/14
Phase 3: Remediation βšͺ Not Started 0/14
Phase 4: Kafka Streaming βšͺ Not Started 0/11
Phase 5: Dashboard & Demo βšͺ Not Started 0/29

πŸ“– Documentation

  • Setup Guide - Step-by-step installation
  • PRD - Product requirements
  • Task List - Implementation roadmap
  • Architecture Guide (coming in Phase 2)
  • API Documentation (coming in Phase 3)
  • Frontend Guide (coming in Phase 5)

🀝 Contributing

This is a hackathon project for the AI Partner Catalyst event.

Development Workflow:

  1. Follow the task list in tasks/tasks-prd-ouroboros-ai-resilience.md
  2. One sub-task at a time (per process guidelines)
  3. Commit after each completed parent task
  4. Run tests before committing

πŸ” Security

  • All secrets stored in Google Secret Manager
  • Service accounts use least-privilege IAM roles
  • No API keys committed to Git
  • Audit logs enabled for all API calls

πŸ’° Cost Estimate

7-Day Hackathon Budget: $25-60

Service Cost
Vertex AI (Gemini 1.5 Pro) $20-50
Cloud Functions $5-10
Datadog Trial $0
Confluent Kafka Free Tier $0

Cost Control: $100 circuit breaker prevents runaway costs


πŸ“… Timeline

Total: 168 hours (7 days)

  • Phase 1 (Hours 0-48): Infrastructure setup
  • Phase 2 (Hours 49-96): Agent development & observability
  • Phase 3 (Hours 97-120): Autonomous remediation
  • Phase 4 (Hours 121-144): Kafka event streaming
  • Phase 5 (Hours 145-168): Dashboard & demo prep

πŸ“§ Contact

Project: Ouroboros AI Resilience Platform
Event: AI Partner Catalyst Hackathon
Date: December 22, 2025


πŸ“„ License

This is a hackathon POC project. Not licensed for production use.


Built with ❀️ for the AI Partner Catalyst Hackathon

"The snake that eats its own tailβ€”regenerating infinitely."

About

Google hackathon Repo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors