Skip to content

Suhas-13/holistichack

Repository files navigation

🎯 Evolution Red Team - LLM Security Testing Platform

World-class automated red-teaming system with unprecedented Glass Box visibility into AI agent behavior

Built for Holistic AI x UCL Hackathon 2025 - Track C (Dear Grandma)


🌟 What Makes This Special

This isn't just another red-teaming tool. We've built a complete AI security intelligence platform that:

  1. Generates Attacks: Evolutionary algorithm spawns diverse jailbreak attempts
  2. Visualizes Evolution: Real-time attack tree with beautiful glass morphism UI
  3. Analyzes Everything: Three-phase Glass Box system provides deep insights
  4. Profiles Agents: Creates complete psychological profiles of target agents
  5. Actionable Intel: Specific recommendations for improving AI safety

The Game-Changer: Target Agent Profiler ⭐

First-of-its-kind behavioral profiling system that creates a complete psychological analysis of the agent under test. Think of it as "profiling a suspect" - but for AI agents.

What You Get:

  • Tool usage patterns and effectiveness
  • Behavioral tendencies (refusal, compliance, evasiveness)
  • Vulnerability analysis with severity ratings
  • Defense mechanism evaluation
  • LLM-powered psychological insights
  • Actionable security recommendations

πŸŽ₯ Quick Demo

30-Second Workflow

1. Start Attack β†’ Enter target endpoint + goals
2. Watch Evolution β†’ Real-time attack tree visualization
3. View Results β†’ Attack success rate + what worked/failed
4. Open Agent Profile β†’ πŸ”¬ Complete behavioral analysis
5. Export Data β†’ Download JSON for team analysis

πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FRONTEND (React)                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Evolution Canvas  β”‚  Results Panel  β”‚  Profile  β”‚  β”‚
β”‚  β”‚  (Attack Tree)     β”‚  (Metrics)      β”‚  (5 Tabs) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚ WebSocket + REST API
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  BACKEND (Python/FastAPI)                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Attack Generation  β”‚  Execution  β”‚  Verification β”‚  β”‚
β”‚  β”‚  (25+ techniques)   β”‚  (Multi-   β”‚  (Llama Guard)β”‚  β”‚
β”‚  β”‚                     β”‚   turn)    β”‚               β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚        GLASS BOX ANALYSIS (3 Phases)              β”‚  β”‚
β”‚  β”‚  1. Batch Explanation   (Map-Reduce Clusters)     β”‚  β”‚
β”‚  β”‚  2. Meta-Analysis       (Global Patterns)         β”‚  β”‚
β”‚  β”‚  3. Target Profiler ⭐  (Behavioral Analysis)     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Backend: Python 3.9+, AWS Bedrock access, Together AI API key
  • Frontend: Node.js 18+, npm

Installation

# 1. Clone the repository
git clone https://github.com/Suhas-13/holistichack.git
cd holistichack

# 2. Set up backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your credentials

# 3. Start backend
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# 4. Set up frontend (new terminal)
cd ../frontend
npm install
npm run dev

Access the App


🎯 Core Features

1. Attack Generation & Execution

βœ… 25+ Seed Attacks

  • Role-playing (DAN, Developer Mode, Evil Bot)
  • Context manipulation (Story, Academic, Translation)
  • Prompt injection (Override, Delimiter, JSON)
  • Multi-turn trust building
  • Encoding & obfuscation (Base64, ROT13, Unicode)
  • Authority exploitation

βœ… Agent Fingerprinting

  • Automatic framework detection (LangGraph, CrewAI, etc.)
  • Model identification (Claude, GPT, Llama)
  • Architecture analysis (ReAct, RAG, Simple Chat)

βœ… Multi-Turn Conversations

  • 1-3 turn attack sequences
  • Dynamic follow-ups based on responses
  • Llama Guard verification for every attack

2. Real-Time Visualization

βœ… Evolution Canvas

  • Interactive attack tree
  • Color-coded success/failure nodes
  • Cluster organization
  • Real-time WebSocket updates

βœ… Results Panel

  • Attack Success Rate (ASR) metrics
  • Successful attack traces
  • Cost and latency tracking
  • LLM-generated insights

3. Glass Box Analysis ⭐ (Our Innovation)

Phase 1: Batch Explanation

  • Map-reduce pattern for scalability
  • Cluster-level summaries via LLM
  • What worked, what failed, key insights

Phase 2: Meta-Analysis

  • Cross-cluster patterns
  • Global attack strategy insights
  • Comprehensive agent learnings

Phase 3: Target Agent Profiler

  • 1000+ lines of profiling logic
  • 6 analysis dimensions: Tools, Behaviors, Failures, Defenses, Responses, Insights
  • LLM-powered psychological profiling
  • Quantified metrics: Vulnerability scores, detection rates, etc.

4. Agent Profile Panel ⭐ (Our Showcase)

Beautiful glass morphism UI with 5 comprehensive tabs:

🧠 Overview

  • Psychological profile and personality
  • Overall security assessment
  • Strengths and weaknesses
  • Actionable recommendations

πŸ”§ Tools

  • Which tools the agent uses
  • Invocation frequency
  • Success rates and effectiveness
  • Most-used tools highlighted

πŸ“Š Behaviors

  • Detected behavioral patterns
  • Confidence and exploitability scores
  • Pattern implications
  • Example responses

πŸ”“ Weaknesses

  • Failure modes by type
  • Severity ratings (Critical/High/Medium/Low)
  • Common triggers
  • Mitigation suggestions

πŸ›‘οΈ Defenses

  • Defense mechanisms
  • Detection and bypass rates
  • Strength ratings
  • Known bypass techniques

πŸ“₯ Export

  • Download complete profile as JSON
  • Share with team
  • Offline analysis

πŸ“Š Key Metrics

Performance

  • ⚑ <5 seconds for complete Glass Box analysis
  • ⚑ 100-1000s of attacks analyzed per session
  • ⚑ Real-time WebSocket updates (60fps animations)

Depth

  • πŸ” 6 analysis dimensions (tools, behaviors, failures, defenses, responses, insights)
  • πŸ” 25+ seed attack techniques
  • πŸ” 1000+ lines of profiling logic

Quality

  • βœ… Zero TypeScript errors - Production-ready code
  • βœ… LLM-powered insights - Natural language explanations
  • βœ… Severity ratings - Quantified risk assessment
  • βœ… Actionable recommendations - Specific security improvements

🎬 Use Cases

Security Audits

"Is my AI agent safe to deploy?"

  • Run comprehensive attack suite
  • Identify all vulnerabilities
  • Prioritize fixes by severity
  • Export report for stakeholders

Model Comparison

"Which model version is safer?"

  • Profile multiple agents
  • Compare vulnerability scores
  • Track improvements over time
  • Make data-driven decisions

Red Team Operations

"How can we attack this agent?"

  • Identify high-exploitability behaviors
  • Focus on critical vulnerabilities
  • Use tool analysis to find weak points
  • Export attack traces for analysis

Defense Optimization

"How effective are our safety measures?"

  • Evaluate defense mechanisms
  • Identify bypass techniques
  • Implement recommended mitigations
  • Measure improvement

πŸ“ Project Structure

holistichack/
β”œβ”€β”€ backend/                    # Python/FastAPI backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ orchestrator.py           # Attack orchestration
β”‚   β”‚   β”œβ”€β”€ batch_explanation.py      # Map-reduce analysis
β”‚   β”‚   β”œβ”€β”€ meta_analysis.py          # Cross-cluster insights
β”‚   β”‚   └── target_agent_profiler.py  # Behavioral profiling ⭐
β”‚   └── README.md
β”œβ”€β”€ frontend/                   # React/TypeScript frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   └── Index.tsx             # Main page
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ EvolutionCanvas.tsx   # Attack tree viz
β”‚   β”‚   β”‚   β”œβ”€β”€ ResultsPanel.tsx      # Results display
β”‚   β”‚   β”‚   └── AgentProfilePanel.tsx # Profile UI ⭐ (650+ lines)
β”‚   β”‚   └── services/
β”‚   β”‚       └── api.ts                # Backend integration
β”‚   └── package.json
└── docs/                       # Comprehensive documentation
    β”œβ”€β”€ GLASS_BOX_COMPLETE_SYSTEM_OVERVIEW.md
    β”œβ”€β”€ TARGET_AGENT_PROFILER_SUMMARY.md
    β”œβ”€β”€ AGENT_PROFILE_FRONTEND_COMPLETE.md
    └── AGENT_PROFILE_ENHANCEMENTS.md

πŸ† Hackathon Tracks

Track C (Red Team): βœ… Primary Focus

  • Systematic attack methodology
  • ASR calculation and metrics
  • Llama Guard verification
  • ⭐ Target Agent Profiler

Track B (Glass Box): βœ… Exceptional Implementation

  • Three-phase analysis system
  • Complete transcript capture
  • LLM-powered explainability
  • ⭐ Beautiful Agent Profile UI

Track A (Iron Man): βœ… Performance Optimized

  • Concurrent execution (5 max)
  • Efficient WebSocket streaming
  • Cost and latency tracking
  • ⭐ Parallel Glass Box processing

πŸŽ“ Technical Highlights

Backend Engineering

# Map-reduce for scalability
async def analyze_clusters(clusters):
    tasks = [analyze_cluster(c) for c in clusters]
    return await asyncio.gather(*tasks)

# Behavioral pattern detection
def detect_patterns(responses):
    patterns = []
    for pattern_type in PATTERN_TYPES:
        confidence = calculate_confidence(responses, pattern_type)
        exploitability = calculate_exploitability(pattern_type)
        patterns.append(BehaviorPattern(...))
    return patterns

Frontend Engineering

// Real-time WebSocket updates
wsService.on("node_update", (data) => {
  updateAttackTree(data);
  showToastNotification(data);
});

// Export functionality
const exportProfile = () => {
  const json = JSON.stringify(profile, null, 2);
  downloadFile(json, `agent-profile-${attackId}.json`);
};

🎨 Design Philosophy

Glass Morphism UI - "Looking into the agent's mind"

  • Translucent backgrounds with frosted glass effect
  • Smooth animations and transitions
  • Color-coded severity indicators
  • Interactive hover states
  • Responsive and accessible

πŸ“š Documentation

Comprehensive documentation available:


🚧 Roadmap

Phase 1: Polish (Immediate)

  • Add loading skeleton states
  • Improve mobile responsiveness
  • Add keyboard shortcuts
  • Performance optimizations

Phase 2: Features (Short-term)

  • Profile comparison (A/B testing)
  • Historical tracking
  • PDF report generation
  • Custom metric thresholds

Phase 3: Intelligence (Medium-term)

  • ML-powered anomaly detection
  • Automated security scoring
  • Attack strategy recommendations
  • Multi-agent comparative analysis

Phase 4: Enterprise (Long-term)

  • CI/CD pipeline integration
  • Slack/Teams notifications
  • Role-based access control
  • Multi-tenancy support

🀝 Contributing

This is a hackathon project, but we welcome feedback and suggestions!

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“„ License

Built for Holistic AI x UCL Hackathon 2025


πŸ™ Acknowledgments

  • Holistic AI for organizing the hackathon
  • UCL for hosting
  • Together AI for Llama Guard API
  • AWS for Bedrock access
  • Anthropic for Claude (used in profiling)

πŸ“ž Contact

For questions, feedback, or demo requests, please open an issue on GitHub.


✨ Why This Wins

Innovation ⭐

  • First-of-its-kind target agent profiling
  • Three-phase Glass Box analysis
  • LLM-powered psychological insights

Quality πŸ†

  • Production-ready code (zero TypeScript errors)
  • Comprehensive documentation (2000+ lines)
  • Beautiful UI with glass morphism

Impact 🎯

  • Unprecedented visibility into AI behavior
  • Actionable security recommendations
  • Enterprise-ready architecture

Performance ⚑

  • <5 seconds for complete analysis
  • 100-1000s attacks analyzed
  • Real-time WebSocket updates

Built with ❀️ for AI Safety

Documentation β€’ Backend README β€’ Issues

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors