World-class automated red-teaming system with unprecedented Glass Box visibility into AI agent behavior
Built for Holistic AI x UCL Hackathon 2025 - Track C (Dear Grandma)
This isn't just another red-teaming tool. We've built a complete AI security intelligence platform that:
- Generates Attacks: Evolutionary algorithm spawns diverse jailbreak attempts
- Visualizes Evolution: Real-time attack tree with beautiful glass morphism UI
- Analyzes Everything: Three-phase Glass Box system provides deep insights
- Profiles Agents: Creates complete psychological profiles of target agents
- Actionable Intel: Specific recommendations for improving AI safety
First-of-its-kind behavioral profiling system that creates a complete psychological analysis of the agent under test. Think of it as "profiling a suspect" - but for AI agents.
What You Get:
- Tool usage patterns and effectiveness
- Behavioral tendencies (refusal, compliance, evasiveness)
- Vulnerability analysis with severity ratings
- Defense mechanism evaluation
- LLM-powered psychological insights
- Actionable security recommendations
1. Start Attack β Enter target endpoint + goals
2. Watch Evolution β Real-time attack tree visualization
3. View Results β Attack success rate + what worked/failed
4. Open Agent Profile β π¬ Complete behavioral analysis
5. Export Data β Download JSON for team analysis
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (React) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Evolution Canvas β Results Panel β Profile β β
β β (Attack Tree) β (Metrics) β (5 Tabs) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β WebSocket + REST API
ββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β BACKEND (Python/FastAPI) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Attack Generation β Execution β Verification β β
β β (25+ techniques) β (Multi- β (Llama Guard)β β
β β β turn) β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β GLASS BOX ANALYSIS (3 Phases) β β
β β 1. Batch Explanation (Map-Reduce Clusters) β β
β β 2. Meta-Analysis (Global Patterns) β β
β β 3. Target Profiler β (Behavioral Analysis) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Backend: Python 3.9+, AWS Bedrock access, Together AI API key
- Frontend: Node.js 18+, npm
# 1. Clone the repository
git clone https://github.com/Suhas-13/holistichack.git
cd holistichack
# 2. Set up backend
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your credentials
# 3. Start backend
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# 4. Set up frontend (new terminal)
cd ../frontend
npm install
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
β 25+ Seed Attacks
- Role-playing (DAN, Developer Mode, Evil Bot)
- Context manipulation (Story, Academic, Translation)
- Prompt injection (Override, Delimiter, JSON)
- Multi-turn trust building
- Encoding & obfuscation (Base64, ROT13, Unicode)
- Authority exploitation
β Agent Fingerprinting
- Automatic framework detection (LangGraph, CrewAI, etc.)
- Model identification (Claude, GPT, Llama)
- Architecture analysis (ReAct, RAG, Simple Chat)
β Multi-Turn Conversations
- 1-3 turn attack sequences
- Dynamic follow-ups based on responses
- Llama Guard verification for every attack
β Evolution Canvas
- Interactive attack tree
- Color-coded success/failure nodes
- Cluster organization
- Real-time WebSocket updates
β Results Panel
- Attack Success Rate (ASR) metrics
- Successful attack traces
- Cost and latency tracking
- LLM-generated insights
- Map-reduce pattern for scalability
- Cluster-level summaries via LLM
- What worked, what failed, key insights
- Cross-cluster patterns
- Global attack strategy insights
- Comprehensive agent learnings
- 1000+ lines of profiling logic
- 6 analysis dimensions: Tools, Behaviors, Failures, Defenses, Responses, Insights
- LLM-powered psychological profiling
- Quantified metrics: Vulnerability scores, detection rates, etc.
Beautiful glass morphism UI with 5 comprehensive tabs:
- Psychological profile and personality
- Overall security assessment
- Strengths and weaknesses
- Actionable recommendations
- Which tools the agent uses
- Invocation frequency
- Success rates and effectiveness
- Most-used tools highlighted
- Detected behavioral patterns
- Confidence and exploitability scores
- Pattern implications
- Example responses
- Failure modes by type
- Severity ratings (Critical/High/Medium/Low)
- Common triggers
- Mitigation suggestions
- Defense mechanisms
- Detection and bypass rates
- Strength ratings
- Known bypass techniques
- Download complete profile as JSON
- Share with team
- Offline analysis
- β‘ <5 seconds for complete Glass Box analysis
- β‘ 100-1000s of attacks analyzed per session
- β‘ Real-time WebSocket updates (60fps animations)
- π 6 analysis dimensions (tools, behaviors, failures, defenses, responses, insights)
- π 25+ seed attack techniques
- π 1000+ lines of profiling logic
- β Zero TypeScript errors - Production-ready code
- β LLM-powered insights - Natural language explanations
- β Severity ratings - Quantified risk assessment
- β Actionable recommendations - Specific security improvements
"Is my AI agent safe to deploy?"
- Run comprehensive attack suite
- Identify all vulnerabilities
- Prioritize fixes by severity
- Export report for stakeholders
"Which model version is safer?"
- Profile multiple agents
- Compare vulnerability scores
- Track improvements over time
- Make data-driven decisions
"How can we attack this agent?"
- Identify high-exploitability behaviors
- Focus on critical vulnerabilities
- Use tool analysis to find weak points
- Export attack traces for analysis
"How effective are our safety measures?"
- Evaluate defense mechanisms
- Identify bypass techniques
- Implement recommended mitigations
- Measure improvement
holistichack/
βββ backend/ # Python/FastAPI backend
β βββ app/
β β βββ orchestrator.py # Attack orchestration
β β βββ batch_explanation.py # Map-reduce analysis
β β βββ meta_analysis.py # Cross-cluster insights
β β βββ target_agent_profiler.py # Behavioral profiling β
β βββ README.md
βββ frontend/ # React/TypeScript frontend
β βββ src/
β β βββ pages/
β β β βββ Index.tsx # Main page
β β βββ components/
β β β βββ EvolutionCanvas.tsx # Attack tree viz
β β β βββ ResultsPanel.tsx # Results display
β β β βββ AgentProfilePanel.tsx # Profile UI β (650+ lines)
β β βββ services/
β β βββ api.ts # Backend integration
β βββ package.json
βββ docs/ # Comprehensive documentation
βββ GLASS_BOX_COMPLETE_SYSTEM_OVERVIEW.md
βββ TARGET_AGENT_PROFILER_SUMMARY.md
βββ AGENT_PROFILE_FRONTEND_COMPLETE.md
βββ AGENT_PROFILE_ENHANCEMENTS.md
- Systematic attack methodology
- ASR calculation and metrics
- Llama Guard verification
- β Target Agent Profiler
- Three-phase analysis system
- Complete transcript capture
- LLM-powered explainability
- β Beautiful Agent Profile UI
- Concurrent execution (5 max)
- Efficient WebSocket streaming
- Cost and latency tracking
- β Parallel Glass Box processing
# Map-reduce for scalability
async def analyze_clusters(clusters):
tasks = [analyze_cluster(c) for c in clusters]
return await asyncio.gather(*tasks)
# Behavioral pattern detection
def detect_patterns(responses):
patterns = []
for pattern_type in PATTERN_TYPES:
confidence = calculate_confidence(responses, pattern_type)
exploitability = calculate_exploitability(pattern_type)
patterns.append(BehaviorPattern(...))
return patterns// Real-time WebSocket updates
wsService.on("node_update", (data) => {
updateAttackTree(data);
showToastNotification(data);
});
// Export functionality
const exportProfile = () => {
const json = JSON.stringify(profile, null, 2);
downloadFile(json, `agent-profile-${attackId}.json`);
};Glass Morphism UI - "Looking into the agent's mind"
- Translucent backgrounds with frosted glass effect
- Smooth animations and transitions
- Color-coded severity indicators
- Interactive hover states
- Responsive and accessible
Comprehensive documentation available:
- System Overview - Complete architecture and data flow
- Target Profiler Backend - Profiling implementation details
- Agent Profile Frontend - UI implementation guide
- Latest Enhancements - Recent feature additions
- Backend README - API documentation and setup
- Add loading skeleton states
- Improve mobile responsiveness
- Add keyboard shortcuts
- Performance optimizations
- Profile comparison (A/B testing)
- Historical tracking
- PDF report generation
- Custom metric thresholds
- ML-powered anomaly detection
- Automated security scoring
- Attack strategy recommendations
- Multi-agent comparative analysis
- CI/CD pipeline integration
- Slack/Teams notifications
- Role-based access control
- Multi-tenancy support
This is a hackathon project, but we welcome feedback and suggestions!
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Built for Holistic AI x UCL Hackathon 2025
- Holistic AI for organizing the hackathon
- UCL for hosting
- Together AI for Llama Guard API
- AWS for Bedrock access
- Anthropic for Claude (used in profiling)
For questions, feedback, or demo requests, please open an issue on GitHub.
- First-of-its-kind target agent profiling
- Three-phase Glass Box analysis
- LLM-powered psychological insights
- Production-ready code (zero TypeScript errors)
- Comprehensive documentation (2000+ lines)
- Beautiful UI with glass morphism
- Unprecedented visibility into AI behavior
- Actionable security recommendations
- Enterprise-ready architecture
- <5 seconds for complete analysis
- 100-1000s attacks analyzed
- Real-time WebSocket updates
Built with β€οΈ for AI Safety
Documentation β’ Backend README β’ Issues