Voice OPs

Inspiration

Modern production systems fail in complex and unpredictable ways, yet incident response is still largely manual. Engineers jump between logs, dashboards, and codebases trying to understand what broke and how to fix it. We were inspired to build a system where AI doesn’t just assist — it actively participates in the full incident lifecycle.

VoiceOps was born from a simple question: What if your system could detect its own failures, explain them out loud, fix them safely, and learn from the outcome?

What it does

VoiceOps is a self-improving AI incident response agent that:

Detects production errors via Datadog logs
Traces the issue back to the responsible file in the local codebase
Uses Gemini (LLM) to generate a minimal, safe patch
Applies the patch locally (no Git required)
Verifies the fix automatically
Logs structured evaluation metrics to Braintrust
Generates a voice summary of the incident and resolution using ElevenLabs

This creates a closed loop:

Detect → Diagnose → Patch → Verify → Score → Learn → Speak

It transforms incident response from reactive debugging into an autonomous AI-assisted workflow.

How we built it

We designed VoiceOps as a modular multi-agent system:

Datadog Integration Pulls real-time logs or simulates incidents for demo purposes.
Trace Agent Analyzes log fingerprints and locates the likely failing file in the repository.
Gemini Patch Agent Sends incident summary + file contents to Gemini and receives a minimal safe patch.
Local Patch Engine Applies full-file LLM-generated patches directly to the filesystem.
Evaluation Engine Verifies correctness (e.g., division-by-zero guard), measures duplication, and tracks patch metrics.
Braintrust Integration Logs structured experiment data including:
- Inputs
- Outputs
- Verification results
- Quantitative scores
ElevenLabs Integration Converts the incident + patch result into a natural voice summary.

The UI was built in Streamlit for rapid prototyping and demo clarity.

Challenges we ran into

Braintrust SDK version inconsistencies We encountered initialization and logging API changes and had to switch to experiment-based logging.
Circular import errors in agents We refactored module structure to eliminate dependency cycles.
Duplicate patch insertion Early patch logic inserted multiple guards. We implemented evaluation metrics to detect this.
Ensuring safe local patching We shifted from diff-based patching to full-file LLM regeneration to simplify and stabilize application.
Making the demo deterministic We added demo mode so judges always see a reproducible incident.

Accomplishments that we're proud of

Built a true closed-loop self-improving system in a hackathon timeframe.
Integrated three sponsor technologies meaningfully:
- Datadog for observability
- Gemini for intelligent patch generation
- Braintrust for evaluation and learning
- ElevenLabs for voice interface
Created measurable AI evaluation instead of just generating text.
Designed a system that feels like a production-ready autonomous DevOps assistant.

What we learned

LLMs are strongest when paired with structured verification.
Evaluation frameworks like Braintrust are essential for AI systems that modify code.
Observability + AI + automated evaluation is a powerful combination.
Building multi-agent systems requires careful state handling and modular design.
The real value is not just generating fixes, but scoring and learning from them.

What's next for VoiceOps

Expand beyond division-by-zero into generalized bug classification.
Add automatic rollback if verification fails.
Integrate real CI pipelines instead of local patching.
Add reinforcement-style learning where Braintrust scores influence future patch prompts.
Introduce real-time voice conversations for interactive incident handling.
Support multi-file patch generation.
Deploy as a GitHub App or DevOps Copilot.

VoiceOps is evolving from a hackathon prototype into an autonomous AI reliability engineer.

Built With

agents
ai
braintrust
datadog
elevenlabs
langchain
python
streamlit

Submitted to

Self Improving Agents Hack

Created by

I designed and built the core multi-agent architecture behind VoiceOps, integrating Datadog for incident detection, Gemini for intelligent patch generation, Braintrust for structured evaluation, and ElevenLabs for voice output. I implemented the full self-improving loop, including local patch application, automated verification, and metric-based scoring. I handled the SDK integration challenges, resolved logging and evaluation issues with Braintrust, and ensured the system produced measurable outcomes rather than just AI-generated text. I also structured the Streamlit interface to clearly demonstrate the end-to-end workflow for judges. Overall, I focused on making the system technically sound, modular, and demo-ready under hackathon constraints.

Deepesh Katudia
Software Engineer specializing in scalable full-stack systems, API-driven architecture, and cloud deployments.
I built "VoiceOps," an AI-powered Streamlit dashboard that automates incident response by using a multi-agent pipeline to analyze logs, find root causes, and apply code patches. It integrates Datadog, OpenAI, and ElevenLabs, and features robust error handling with local simulators to ensure the system runs flawlessly even if external APIs fail.

Aman Vlogs
Amey Borkar
jeet choksi

Updates

Amey Borkar started this project — Feb 21, 2026 04:12 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.