An autonomous agent that monitors API endpoints in real-time, detects anomalies, diagnoses root causes via LLM, and takes corrective action — then learns from each incident to respond faster next time.
Built in 5 hours for the SJSU Applied Data Science Hackathon 2026.
Mock APIs → Nexla (normalize) → Agent Loop (observe → diagnose → act) → Dashboard
↕
Case Memory ← Auto-Tune
Data layer — FastAPI chaos server with toggleable failure modes (latency spikes, error bursts, degraded responses). Nexla normalizes raw signals into a unified event schema; a fallback poller is included if Nexla setup is slow.
Agent core — Railtracks-orchestrated loop. The anomaly detector uses a z-score over a 60-second rolling window. When triggered, an LLM on DigitalOcean diagnoses root cause and recommends an action (reroute, alert, or wait). The executor carries it out.
Self-improvement — Every resolved incident is logged as a structured case. The auto-tuner adjusts detection thresholds based on outcomes (true positive → tighten, false positive → loosen). Similar past cases are injected into the LLM diagnosis prompt, so the agent gets faster and more accurate over time.
Dashboard — Lovable-generated frontend showing live metrics, an incident feed, and a response-time-delta chart that visualizes the agent learning.
agentforge/
├── server/ # Mock API + chaos injection
│ ├── main.py # FastAPI app: /health, /checkout, /chaos/*
│ ├── chaos.py # Failure mode toggle (latency, errors, degraded)
│ └── schemas.py # Pydantic models + NormalizedEvent contract
├── agent/ # Core agent loop (Railtracks)
│ ├── loop.py # observe → diagnose → act
│ ├── detector.py # Z-score anomaly detector
│ ├── diagnoser.py # LLM root-cause chain
│ ├── executor.py # REROUTE / ALERT / WAIT
│ └── config.py # Thresholds, endpoints, model config
├── memory/ # Self-improvement layer
│ ├── case_store.py # Incident logging + similarity retrieval
│ └── auto_tune.py # threshold_k adjustment from outcomes
├── ingestion/ # Data pipeline (Nexla)
│ ├── webhook.py # Receives Nexla-pushed normalized events
│ └── poller.py # Fallback: direct HTTP polling
├── dashboard/ # Frontend (Lovable)
│ ├── src/
│ │ ├── App.tsx
│ │ ├── MetricsPanel.tsx
│ │ ├── IncidentFeed.tsx
│ │ └── ImprovementChart.tsx
│ └── package.json
├── requirements.txt
├── .env # API keys, model endpoint
├── docker-compose.yml # server + agent + ingestion
└── README.md
pip install -r requirements.txtuvicorn server.main:app --port 8000 --reloadpython -m agent.loop# Latency spikes
curl -X POST http://localhost:8000/chaos/enable \
-H "Content-Type: application/json" \
-d '{"mode": "latency", "latency_min": 2.0, "latency_max": 5.0}'
# Error bursts
curl -X POST http://localhost:8000/chaos/enable \
-H "Content-Type: application/json" \
-d '{"mode": "errors", "error_rate": 0.5}'
# Degraded responses
curl -X POST http://localhost:8000/chaos/enable \
-H "Content-Type: application/json" \
-d '{"mode": "degraded"}'
# Kill switch
curl -X POST http://localhost:8000/chaos/disableField Description
event_id Unique hex UUID for correlating detection → diagnosis → action
endpoint The API path polled (/health or /checkout)
timestamp UTC ISO-8601 timestamp of the poll
latency_ms Round-trip response time in milliseconds
status_code HTTP status (200 on success, 500 on error, 0 on timeout/connection failure)
error_rate_1m Rolling error rate across recent polls (0.0 = all healthy, 1.0 = all failing)
is_degraded true when a 2xx response is missing required fields or has known-bad values
error_detail Error message from 4xx/5xx responses; null on success
source Always "poller" for this ingestion path (vs "nexla" or "health_check")
schema_version Schema version for forward compatibility
- Baseline (30s) — Dashboard shows green metrics, agent is monitoring quietly.
- First incident — Toggle latency spike. Agent detects in ~5-10s, diagnoses via LLM, reroutes traffic, posts summary to dashboard.
- Recovery — Disable chaos. Agent logs the full incident as a case.
- Second incident — Toggle error bursts. Agent recognizes a similar pattern from case memory, responds faster, references the prior incident. The response-time delta on the dashboard is the proof point.
| Tool | Role |
|---|---|
| Railtracks | Agent orchestration loop |
| Nexla | Real-time data ingestion + normalization |
| DigitalOcean | LLM inference for diagnosis |
| Lovable | Dashboard frontend |
Each incident produces a case record:
{
"id": "inc_001",
"timestamp": "2026-04-18T14:32:01Z",
"symptoms": {"endpoint": "/checkout", "z_score": 3.4, "error_rate": 0.18},
"diagnosis": "Upstream payment gateway timeout",
"action_taken": "REROUTE",
"outcome": "resolved",
"resolution_time_ms": 28400
}The auto-tuner adjusts threshold_k on the anomaly detector: true positives tighten it (k -= 0.1, floor 1.5), false positives loosen it (k += 0.2, cap 4.0). Similar past cases are appended to the LLM diagnosis prompt, so the agent builds context over time.
DIGITALOCEAN_API_KEY= # LLM inference endpoint
DIGITALOCEAN_MODEL_URL= # Model serving URL
NEXLA_WEBHOOK_SECRET= # Webhook auth (if using Nexla)