SpeakOps

Landing Page with google Oauth
System Architecture
Uploading your business document
Add you website URL to Retrieve context
Last step: Create your
Dashboard
Dashboard
Agent settings
Agent settings

Inspiration

We were inspired by a simple but persistent reality: small businesses and frontline solopreneurs do not struggle because they lack skill — they struggle because they are forced to do everything else.

An HVAC technician, electrician, or plumber is not just fixing systems. In between jobs — often while standing on a ladder, under a house, or in a mechanical room — they are answering customer calls, quoting jobs, scheduling visits, updating invoices, logging work into an ERP, and managing follow-ups.

Every missed call is not just a missed message. It is lost revenue.

When we looked at the tools available to them, the gap was clear. Existing solutions either:

Require complex setup
Fail to understand the actual business behind the phone number
Stop at chatbots that can talk but cannot take action

We knew this was a problem we could fix.

What It Does

We built a frictionless AI operations agent for SMBs and frontline solopreneurs.

In minutes, a business can:

Provide basic business context
Share their company website
Upload business documents (pricing, policies, FAQs)
Optionally connect their ERP or internal systems

From that, SpeakOps automatically:

Parses the business context
Trains a domain-specific AI agent
Deploys it as a dedicated phone number or customer interface
Turns it into a personal, always-on customer service representative

This is not just Q&A.

During a live phone call, the agent can:

Answer customer questions using real business knowledge
Check availability and schedule jobs
Generate or confirm quotes
Update records in connected ERP systems
Log completed actions automatically

The call becomes the source of truth.

Example

A customer calls to schedule an HVAC repair.

The agent:

Confirms availability
Books the job
Updates the ERP
Sends confirmation

All while the technician is still on the job.

No callbacks.
No lost leads.
No manual cleanup later.

How We Built It

We built SpeakOps as a full-stack AI voice platform in 48 hours, integrating six services into a single real-time pipeline that lets callers have natural, multi-turn conversations with AI agents over the phone.

Architecture at a Glance

Caller → Twilio (Voice) → Next.js Webhook → ElevenLabs STT → Gemini 2.0 Flash → ElevenLabs TTS → Caller hears response

Tech Stack

Frontend

React 19, Tailwind CSS, shadcn/ui
3-step agent creation wizard
Multi-tab analytics dashboard built with Recharts
Agent settings panel with role presets, tone and language controls, business hours, and action management

Auth & Database

Firebase Authentication with Google OAuth
Google BigQuery
Document-based RAG pipeline using Vertex AI text-embedding-005

Voice Pipeline

Twilio hits our webhook, greets the caller, and starts recording
We immediately return programmatically generated ambient hold music so the caller never hears silence
In the background, we download the recording and transcribe it with ElevenLabs STT (Scribe v1)
The full conversation history is sent to Gemini 2.0 Flash for response generation
The response is synthesized with ElevenLabs TTS
Using the Twilio REST API, we redirect the live call mid-hold-music to play the generated audio and begin recording the next turn

Multi-Tenancy

Each agent is automatically provisioned with a dedicated Twilio phone number
Webhook URLs are parameterized by agent ID
A single SpeakOps deployment can serve unlimited agents with strict isolation of data and state

Challenges We Ran Into

Juggling six APIs under time pressure
SpeakOps integrates Firebase, BigQuery, Twilio, ElevenLabs (STT and TTS), Gemini, and Vertex AI. Each service has its own authentication model, rate limits, and failure modes, making coordination under a 48-hour deadline especially challenging.
Real-time voice latency
Phone calls have zero tolerance for silence. With five network hops in our pipeline, we solved this by returning hold music immediately via TwiML and redirecting the live call once audio generation completed.
Next.js 15 breaking changes
Route handler params became asynchronous. Accessing params.agentId without await silently returned undefined, causing Twilio webhooks to fail with only a generic “System application error.”
ngrok webhook instability
Free-tier ngrok URLs change on every restart, requiring updates to environment variables, server restarts, and Twilio webhook configs. Missing any step caused silent call failures.

Accomplishments We’re Proud Of

End-to-end voice AI in 48 hours
Callers dial a real phone number and hold natural, multi-turn conversations with AI agents that understand their business. Every turn is logged with full transcripts.
Zero-silence caller experience
Background processing with ambient hold music transforms unavoidable latency into a polished, human-like interaction.
True multi-tenancy
Each agent has its own phone number, knowledge base, conversation history, and analytics, all served from a single deployment.
Document-powered RAG
Business documents (PDFs and text files) are OCR’d, embedded via Vertex AI, and retrieved at query time using BigQuery VECTOR_SEARCH, allowing agents to answer with real business knowledge.
Full-featured dashboard
Call history with expandable transcripts, interactive analytics charts, per-agent token usage, and a comprehensive settings panel.

What We Learned

Voice UX is fundamentally different from chat UX
Two seconds of silence on a phone call feels broken. Designing for perceived latency matters more than raw speed.
BigQuery streaming inserts have quirks
ISO timestamps, eventual consistency, and dropped nulls required careful schema and ingestion design.
Prompt engineering for voice is its own discipline
Explicitly instructing Gemini that responses would be read aloud dramatically improved clarity and naturalness.
Multi-service systems demand defensive coding
Any external dependency can fail. Graceful fallbacks and aggressive logging are essential to prevent broken calls.

What’s Next for SpeakOps

ERP integrations with tools like QuickBooks, Jobber, ServiceTitan, and Google Calendar
Multi-channel expansion to SMS, WhatsApp, and web chat with shared conversation context
Agentic tool use for bookings, quotes, inventory checks, and payments during live calls
Fine-tuned voice cloning so agents can sound like the business owner
Analytics and coaching to surface sentiment trends, customer pain points, and automatically improve prompts over time