A voice-activated data analyst that converts natural language speech into SQL queries using AI. Ask questions about your data using your voice, and get instant answers spoken back to you.
- Implementing microservices with gRPC and Protocol Buffers
- Real-time bidirectional streaming with WebSockets
- Monorepo management with npm workspaces
- TypeScript across full-stack applications
- Speech-to-text and text-to-speech integration
- AI function calling and tool use patterns
- 🎙️ Voice Input: Speak naturally to query your database
- 🤖 AI-Powered: Uses Gemini 2.5 Flash Lite for natural language understanding
- 📊 Speech-to-SQL: Automatically converts questions into SQL queries
- 🔊 Voice Output: Hear the results spoken back to you
- 💬 Conversational: Maintains context across multiple questions
- ⚡ Real-time: Instant responses via WebSocket streaming
┌─────────────────┐
│ React Frontend │ (Voice Input/Output, UI)
│ Port: 5174 │
└────────┬────────┘
│ WebSocket
↓
┌─────────────────┐
│ API Gateway │ (WebSocket ↔ gRPC Bridge)
│ Ports: 4000-1 │
└────────┬────────┘
│ gRPC
↓
┌─────────────────┐
│ Voice AI Service│ (Gemini AI + SQLite)
│ Port: 50051 │
└─────────────────┘
Frontend:
- React + TypeScript + Vite
- Web Speech API (native browser TTS/STT)
- WebSocket for real-time communication
Backend:
- Node.js + TypeScript
- gRPC (Protocol Buffers) for microservice communication
- GraphQL (Apollo Server) for data queries
- Gemini 2.5 Flash Lite for AI/NLU
- SQLite for data storage
Architecture Patterns:
- Microservices (Gateway + Voice Service)
- Monorepo (npm workspaces)
- Function Calling (Gemini tools)
- Bidirectional streaming (gRPC)
fullstack/
├── apps/
│ └── web/ # React frontend
├── services/
│ ├── gateway/ # API Gateway (WebSocket + GraphQL)
│ └── voice-agent/ # Voice AI Service (Gemini + SQLite)
└── packages/
└── protos/ # Shared gRPC definitions
- Node.js 18+
- npm 9+
- Gemini API key (Get one free)
-
Clone the repository:
git clone https://github.com/jwalith/Voice-to-SQL.git cd Voice-to-SQL -
Install dependencies:
npm install
-
Set up environment variables:
cd services/voice-agent cp .env.example .envEdit
.envand add your Gemini API key:GEMINI_API_KEY=your_actual_key_here -
Start the application:
cd ../.. # Back to root npm run dev
-
Open your browser:
http://localhost:5174
- Click "🎤 Speak to Jarvis"
- Ask a question like:
- "How many laptops did we sell?"
- "What's the total revenue from electronics?"
- "Show me all sales from yesterday"
- Hear the AI respond with the answer!
The demo includes a mock sales database with:
- Laptops, Mice, Keyboards (electronics)
- Coffee (food)
Try asking:
- "How many laptops did we sell?"
- "What's the total amount from laptop sales?"
- "Show me all electronics"
# Voice AI Service only
npm run dev:voice
# API Gateway only
npm run dev:gateway
# Frontend only
npm run dev:webPress Ctrl + C in the terminal running npm run dev
- User speaks → Browser's Web Speech API transcribes
- Text sent → WebSocket to API Gateway
- Gateway forwards → gRPC to Voice AI Service
- Gemini processes → Generates SQL via function calling
- SQL executes → Against SQLite database
- Results summarized → Gemini creates natural language response
- Response spoken → Browser's TTS reads it aloud
The AI uses Gemini's function calling feature to execute SQL:
// Tool definition tells Gemini how to query the database
{
name: "query_database",
description: "Execute SQL against sales database...",
parameters: { sql: "string" }
}
// Gemini automatically generates:
query_database({ sql: "SELECT COUNT(*) FROM sales WHERE item LIKE '%Laptop%'" })- Persistent conversation history
- User authentication
- File-based SQLite database
- Visual data dashboards
- Export query results
- Multi-user support
- Custom database connections
MIT
Jwalith
- GitHub: @jwalith
- Project: Voice-to-SQL
- Google Gemini AI for the LLM
- Web Speech API for browser-native voice capabilities
- gRPC and Protocol Buffers for efficient microservice communication