Inspiration

Having worked in business analytics environments, our team knows first‐hand the number of tickets raised and escalations caused by non-timely resolution of issues by call-centre assistants. This isn’t the assistants’ fault — what do you do when you have a plethora of unorganized information at your disposal, but inaccessible when you need it?

In order to streamline issue resolution and increase customer satisfaction, our team jumped at the idea of creating a real-time voice transcription agent powered by LLMs and RAG to generate timely, context-relevant suggestions.


What it does

  1. Audio Transcription:
    Uses AWS Transcribe to display live transcripts as you speak.
  2. Context Storage:
    Built a RAG pipeline with ChromaDB; users can upload PDFs containing tone guidelines, common issues, etc.
  3. Real-time Advice Generation:
    Leverages Ollama to generate advice on-the-fly based on both your transcript and uploaded PDFs.
  4. Customer Satisfaction Score:
    Generates a 1–10 score via sentiment analysis (TextBlob) to capture satisfaction accurately.
  5. Summarization CSV:
    Produces a concise summary (plus extracted customer name) and auto-downloads it as a CSV for later analysis.

How we built it

1) Frontend

  • Vite + React for a lightning-fast, hot-reload dev experience
  • WebSocket-based audio streaming using the browser’s MediaStream & ScriptProcessorNode
  • Custom 10 s polling loop to fetch advice without spamming the LLM
  • File-upload widget to ingest PDFs for on-demand vectorization

2) Backend

  • FastAPI (async) for both WebSocket and REST endpoints
  • Amazon Transcribe Streaming client for low-latency STT
  • ChromaDB vector store for transcripts & PDF embeddings
  • Ollama for on-prem LLM inference (advice, summarization, name extraction)
  • TextBlob for sentiment analysis → 1–10 satisfaction score

3) Deployment & Infra

  • Single EC2 instance (g4ad.xlarge) hosting FastAPI, ChromaDB, and Ollama
  • Dockerized services behind an Nginx reverse proxy for HTTPS & CORS

Challenges we ran into

  1. Real-time streaming:
    Coordinating audio buffers, WebSocket delivery, and AWS Transcribe’s partial vs. final transcripts.
  2. Prompt engineering:
    Constraining Ollama to “only return customer name” or a strict 5-line summary without extra chatter.
  3. Compute limits:
    GPU constraints on our EC2 tier meant longer inference times and fewer concurrent sessions.
  4. CORS & networking:
    Debugging cross-origin issues between React (port 5173) and FastAPI (port 8000).

Accomplishments & learnings

  1. Learning about AWS:
    Our first foray into EC2—launching instances, evaluating costs, configuring inbound/outbound rules, and setting up security groups.
  2. Frontend Development:
    Mastered end-to-end audio capture, buffering, WebSocket streaming at 16 kHz, and built a responsive React UI with live transcripts, 10 s advice updates, and robust cleanup/timer management.
  3. Backend Development:
    Stitched together FastAPI, ChromaDB, and Ollama for real-time PDF ingestion, embedding generation, and vector search. Learned to parse messy docs, craft low-hallucination prompts, and serve streaming advice plus downloadable CSVs efficiently.

What’s next for SOUNDAdvice

  1. Analytics dashboard:
    Visualize sentiment trends and historical issues per customer.
  2. Model upgrades:
    Integrate larger LLaMA 3 or Mistral models once GPU resources allow.
  3. Multi-language support:
    Extend beyond English to serve low-resource languages.
  4. CRM integration:
    Auto-log summaries & tickets into Salesforce, Zendesk, etc.

Built With

Share this project:

Updates