AI agent for evaluating Physical AI systems.
Describe the behavior you expect out of your physical AI system in plain English. BotProbe speaks to it, listens to its response, and returns a PASS or FAIL verdict with observations — no manual testing required.
BotProbe is a behavioral diagnostic agent designed for physical AI robots. You give it a natural language specification, and it runs an autonomous observe-evaluate loop:
- Speaks a prompt to the bot via the laptop speaker
- Listens to the bot's response via the laptop microphone
- Analyzes the audio against your specification using Gemini
- Returns a PASS/FAIL verdict with a description of what it actually observed
The agent decides how many observations are needed before issuing a verdict — simple tests complete in one round; complex behavioral tests may require more.
- Responsiveness — Does the bot respond when spoken to?
- Interruption handling — When a user interrupts mid-sentence, does the bot stop within 1 second?
- Audio quality — Is the bot's speech clear and audible?
- Command recognition — Does the bot correctly execute a given voice command?
- Conversation flow — Does the bot handle multi-turn exchanges correctly?
- Edge case robustness — How does the bot behave under background noise or simultaneous speech?
Browser
|
| HTTP (UI assets)
v
Next.js :3000 (Frontend Server)
|
| HTTP proxy /api/*
v
FastAPI :8000 (Core Backend)
|
| google.genai SDK
v
Gemini 2.5 Flash
The Next.js server proxies all /api/* requests to the Python backend via next.config.ts rewrites. The browser only ever talks to one origin.
User submits spec
|
v
Gemini decides: call speakToBot, recordAudio, or issue verdict
|
+-- speakToBot --> browser plays text via Web Speech API --> bot hears it
|
+-- recordAudio --> browser records mic for N seconds --> captures bot's response
| (optional: plays interrupt stimulus mid-recording)
|
v
Audio returned to Gemini as inline data (base64 WebM)
|
v
Gemini analyzes audio against spec --> PASS / FAIL + observations
Client-side tools (speakToBot, recordAudio) run in the browser. The agent loop is driven by useChat with sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls — no polling, no manual retries.
| Layer | Technology |
|---|---|
| Frontend framework | Next.js 16 (App Router) |
| Frontend language | TypeScript |
| Styling | Tailwind CSS 4 |
| Agent SDK | Vercel AI SDK v6 (ai, @ai-sdk/react) |
| Backend framework | FastAPI (Python) |
| LLM | Gemini 2.5 Flash via google.genai |
| Audio | Web MediaRecorder API + Web Speech API |
npm installcd server
pip install -r requirements.txtCreate .env in server/:
GOOGLE_API_KEY=your_key_here
Start the backend:
cd server
uvicorn main:app --port 8000 --reloadStart the frontend:
npm run devOpen http://localhost:3000.
Hardware requirement: A physical bot that listens via microphone and responds via speaker — the laptop's own mic and speaker are used for bidirectional audio.
botprobe/
app/ # Next.js App Router (frontend only)
components/ # React components
tools/ # Client-side tool implementations (browser audio)
lib/ # Shared frontend utilities
server/
main.py # FastAPI app — POST /api/diagnose
message_converter.py # UIMessage[] -> Gemini Content[] conversion
tools.py # Pydantic tool schemas + Gemini FunctionDeclaration
stream_writer.py # AI SDK UI Message Stream Protocol helpers
system_prompt.md # Agent system prompt
requirements.txt
next.config.ts # Proxies /api/* to FastAPI