Inspiration

No one enjoys spending 20+ minutes on a customer support call just to get a refund — especially after already waiting days for delivery. The typical process is slow, manual, repetitive, and filled with friction: verifying identity, retrieving order history, checking eligibility, and executing actions — all over a voice call. We wanted to automate and streamline this experience using AI.

What it does

CallSense is an AI-driven system that listens to live conversations between customers and agents, transcribes them in real time, understands intent, and autonomously executes support workflows — like refunds — by integrating with internal tools.

Transcribes ongoing conversations with speaker labels.

Uses LLMs to generate actionable tasks based on the dialog.

Verifies customer and order details by fetching from internal RAG-powered DB.

Validates refund eligibility based on company policies.

Executes actions through an AI agent using Playwright, such as issuing refunds or updating status.

Provides feedback and policy-verification responses in real time.

How we built it

Speech-to-Text Layer: Converts conversation to a transcript. LLM Planner: Parses the transcript and triggers interrupt events (e.g., refund intent) to generate a structured plan. Orchestrator (LLM-based): Sequences task execution and interacts with the backend. Central State Manager: Maintains conversation context and tracks task progress. Retrieval-Augmented Generation (RAG): Fetches policy/order/customer info from a custom database. AI Agent (Playwright): Executes actions on internal portals. Frontend: Displays execution feedback and policy responses.

Challenges we ran into

Achieving real-time diarized transcription with low latency. Aligning free-form LLM outputs into structured tasks that the orchestrator can reliably execute. Building a robust task orchestration loop that allows dynamic interaction and failure recovery. Integrating Playwright automation with stateful logic from the orchestrator. Ensuring responses align with company policy and compliance standards.

Accomplishments that we're proud of

End-to-end working prototype that takes a live call and completes a refund request. Seamless integration of LLM reasoning, RAG context, and browser-based action agents. Achieved interrupt-driven task execution mid-call without needing rigid flows. Designed a scalable architecture that can be extended to support other workflows like address updates, order cancellations, and complaints.

What we learned

LLMs are powerful at extracting intent and generating plans — but need strong orchestration for enterprise tasks. Speaker diarization and accurate real-time transcription are critical for turn-level understanding. Automating human support workflows requires state tracking, fallback mechanisms, and explainable decisions to maintain trust. Low-latency pipelines are achievable with careful module design.

What's next for CallSense

Add multi-language support for global use cases. Fine-tune domain-specific LLMs for more accurate plan generation. Extend to chat-based and email-based support Add agent assist mode, where humans get AI-generated recommendations instead of full automation. Deploy in a controlled pilot environment with real users for feedback and iteration.

Built With

  • assemblyai
  • browser-user-simulation
  • custom-orchestrator-(python)
  • gradio
  • hugging-face-transformers
  • openai-api
  • playwright-(python)
  • pytorch
  • rag-pipeline
Share this project:

Updates