Inspiration

We were inspired by a simple but persistent reality: small businesses and frontline solopreneurs do not struggle because they lack skill — they struggle because they are forced to do everything else.

An HVAC technician, electrician, or plumber is not just fixing systems. In between jobs — often while standing on a ladder, under a house, or in a mechanical room — they are answering customer calls, quoting jobs, scheduling visits, updating invoices, logging work into an ERP, and managing follow-ups.

Every missed call is not just a missed message. It is lost revenue.

When we looked at the tools available to them, the gap was clear. Existing solutions either:

  • Require complex setup
  • Fail to understand the actual business behind the phone number
  • Stop at chatbots that can talk but cannot take action

We knew this was a problem we could fix.


What It Does

We built a frictionless AI operations agent for SMBs and frontline solopreneurs.

In minutes, a business can:

  • Provide basic business context
  • Share their company website
  • Upload business documents (pricing, policies, FAQs)
  • Optionally connect their ERP or internal systems

From that, SpeakOps automatically:

  • Parses the business context
  • Trains a domain-specific AI agent
  • Deploys it as a dedicated phone number or customer interface
  • Turns it into a personal, always-on customer service representative

This is not just Q&A.

During a live phone call, the agent can:

  • Answer customer questions using real business knowledge
  • Check availability and schedule jobs
  • Generate or confirm quotes
  • Update records in connected ERP systems
  • Log completed actions automatically

The call becomes the source of truth.

Example

A customer calls to schedule an HVAC repair.

The agent:

  • Confirms availability
  • Books the job
  • Updates the ERP
  • Sends confirmation

All while the technician is still on the job.

No callbacks.
No lost leads.
No manual cleanup later.

How We Built It

We built SpeakOps as a full-stack AI voice platform in 48 hours, integrating six services into a single real-time pipeline that lets callers have natural, multi-turn conversations with AI agents over the phone.

Architecture at a Glance

Caller → Twilio (Voice) → Next.js Webhook → ElevenLabs STT → Gemini 2.0 Flash → ElevenLabs TTS → Caller hears response


Tech Stack

Frontend

  • React 19, Tailwind CSS, shadcn/ui
  • 3-step agent creation wizard
  • Multi-tab analytics dashboard built with Recharts
  • Agent settings panel with role presets, tone and language controls, business hours, and action management

Auth & Database

  • Firebase Authentication with Google OAuth
  • Google BigQuery
  • Document-based RAG pipeline using Vertex AI text-embedding-005

Voice Pipeline

  1. Twilio hits our webhook, greets the caller, and starts recording
  2. We immediately return programmatically generated ambient hold music so the caller never hears silence
  3. In the background, we download the recording and transcribe it with ElevenLabs STT (Scribe v1)
  4. The full conversation history is sent to Gemini 2.0 Flash for response generation
  5. The response is synthesized with ElevenLabs TTS
  6. Using the Twilio REST API, we redirect the live call mid-hold-music to play the generated audio and begin recording the next turn

Multi-Tenancy

  • Each agent is automatically provisioned with a dedicated Twilio phone number
  • Webhook URLs are parameterized by agent ID
  • A single SpeakOps deployment can serve unlimited agents with strict isolation of data and state

Challenges We Ran Into

  • Juggling six APIs under time pressure
    SpeakOps integrates Firebase, BigQuery, Twilio, ElevenLabs (STT and TTS), Gemini, and Vertex AI. Each service has its own authentication model, rate limits, and failure modes, making coordination under a 48-hour deadline especially challenging.

  • Real-time voice latency
    Phone calls have zero tolerance for silence. With five network hops in our pipeline, we solved this by returning hold music immediately via TwiML and redirecting the live call once audio generation completed.

  • Next.js 15 breaking changes
    Route handler params became asynchronous. Accessing params.agentId without await silently returned undefined, causing Twilio webhooks to fail with only a generic “System application error.”

  • ngrok webhook instability
    Free-tier ngrok URLs change on every restart, requiring updates to environment variables, server restarts, and Twilio webhook configs. Missing any step caused silent call failures.


Accomplishments We’re Proud Of

  • End-to-end voice AI in 48 hours
    Callers dial a real phone number and hold natural, multi-turn conversations with AI agents that understand their business. Every turn is logged with full transcripts.

  • Zero-silence caller experience
    Background processing with ambient hold music transforms unavoidable latency into a polished, human-like interaction.

  • True multi-tenancy
    Each agent has its own phone number, knowledge base, conversation history, and analytics, all served from a single deployment.

  • Document-powered RAG
    Business documents (PDFs and text files) are OCR’d, embedded via Vertex AI, and retrieved at query time using BigQuery VECTOR_SEARCH, allowing agents to answer with real business knowledge.

  • Full-featured dashboard
    Call history with expandable transcripts, interactive analytics charts, per-agent token usage, and a comprehensive settings panel.


What We Learned

  • Voice UX is fundamentally different from chat UX
    Two seconds of silence on a phone call feels broken. Designing for perceived latency matters more than raw speed.

  • BigQuery streaming inserts have quirks
    ISO timestamps, eventual consistency, and dropped nulls required careful schema and ingestion design.

  • Prompt engineering for voice is its own discipline
    Explicitly instructing Gemini that responses would be read aloud dramatically improved clarity and naturalness.

  • Multi-service systems demand defensive coding
    Any external dependency can fail. Graceful fallbacks and aggressive logging are essential to prevent broken calls.


What’s Next for SpeakOps

  • ERP integrations with tools like QuickBooks, Jobber, ServiceTitan, and Google Calendar
  • Multi-channel expansion to SMS, WhatsApp, and web chat with shared conversation context
  • Agentic tool use for bookings, quotes, inventory checks, and payments during live calls
  • Fine-tuned voice cloning so agents can sound like the business owner
  • Analytics and coaching to surface sentiment trends, customer pain points, and automatically improve prompts over time

Team

  • Shaik Sufyaan
  • Varun Ahlawat
  • Shwejan Uppala

Built With

Share this project:

Updates