Skip to content

likhitjuttada/botprobe

Repository files navigation

BotProbe

AI agent for evaluating Physical AI systems.

Describe the behavior you expect out of your physical AI system in plain English. BotProbe speaks to it, listens to its response, and returns a PASS or FAIL verdict with observations — no manual testing required.


What it does

BotProbe is a behavioral diagnostic agent designed for physical AI robots. You give it a natural language specification, and it runs an autonomous observe-evaluate loop:

  1. Speaks a prompt to the bot via the laptop speaker
  2. Listens to the bot's response via the laptop microphone
  3. Analyzes the audio against your specification using Gemini
  4. Returns a PASS/FAIL verdict with a description of what it actually observed

The agent decides how many observations are needed before issuing a verdict — simple tests complete in one round; complex behavioral tests may require more.


Use cases

  • Responsiveness — Does the bot respond when spoken to?
  • Interruption handling — When a user interrupts mid-sentence, does the bot stop within 1 second?
  • Audio quality — Is the bot's speech clear and audible?
  • Command recognition — Does the bot correctly execute a given voice command?
  • Conversation flow — Does the bot handle multi-turn exchanges correctly?
  • Edge case robustness — How does the bot behave under background noise or simultaneous speech?

Architecture

Browser
  |
  | HTTP (UI assets)
  v
Next.js  :3000   (Frontend Server)
  |
  | HTTP proxy  /api/*
  v
FastAPI  :8000   (Core Backend)
  |
  | google.genai SDK
  v
Gemini 2.5 Flash

The Next.js server proxies all /api/* requests to the Python backend via next.config.ts rewrites. The browser only ever talks to one origin.

How the agent loop works

User submits spec
      |
      v
Gemini decides: call speakToBot, recordAudio, or issue verdict
      |
      +-- speakToBot  -->  browser plays text via Web Speech API  -->  bot hears it
      |
      +-- recordAudio -->  browser records mic for N seconds       -->  captures bot's response
      |                    (optional: plays interrupt stimulus mid-recording)
      |
      v
Audio returned to Gemini as inline data (base64 WebM)
      |
      v
Gemini analyzes audio against spec  -->  PASS / FAIL + observations

Client-side tools (speakToBot, recordAudio) run in the browser. The agent loop is driven by useChat with sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls — no polling, no manual retries.


Tech stack

Layer Technology
Frontend framework Next.js 16 (App Router)
Frontend language TypeScript
Styling Tailwind CSS 4
Agent SDK Vercel AI SDK v6 (ai, @ai-sdk/react)
Backend framework FastAPI (Python)
LLM Gemini 2.5 Flash via google.genai
Audio Web MediaRecorder API + Web Speech API

Setup

Frontend

npm install

Backend

cd server
pip install -r requirements.txt

Create .env in server/:

GOOGLE_API_KEY=your_key_here

Running

Start the backend:

cd server
uvicorn main:app --port 8000 --reload

Start the frontend:

npm run dev

Open http://localhost:3000.

Hardware requirement: A physical bot that listens via microphone and responds via speaker — the laptop's own mic and speaker are used for bidirectional audio.


Project structure

botprobe/
  app/                      # Next.js App Router (frontend only)
  components/               # React components
  tools/                    # Client-side tool implementations (browser audio)
  lib/                      # Shared frontend utilities
  server/
    main.py                 # FastAPI app — POST /api/diagnose
    message_converter.py    # UIMessage[] -> Gemini Content[] conversion
    tools.py                # Pydantic tool schemas + Gemini FunctionDeclaration
    stream_writer.py        # AI SDK UI Message Stream Protocol helpers
    system_prompt.md        # Agent system prompt
    requirements.txt
  next.config.ts            # Proxies /api/* to FastAPI

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors