Skip to content

krishna-kudari/Text2SQL-Agent

Repository files navigation

DataChat - Production-Ready Text2SQL Agent

A self-learning Text-to-SQL agent with natural language interface, built for production.

Features

  • Natural Language to SQL: Ask questions in plain English, get SQL queries and results
  • Self-Learning: Improves with every interaction through feedback loops
  • Multi-Database Support: PostgreSQL, MySQL, SQLite, and more
  • Human-in-the-Loop: Validation for complex queries before execution
  • Custom Instructions: Define project-specific rules and terminology
  • Multiple Chats: Maintain context across conversations
  • Plan & Confirm: See execution plans before running complex queries

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Frontend (Next.js + shadcn)                  │
├─────────────────────────────────────────────────────────────────┤
│  Chat UI │ Projects │ Data Sources │ Custom Instructions │ KB   │
└────────────────────────────────┬────────────────────────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────┐
│                   Backend API (FastAPI)                          │
├─────────────────────────────────────────────────────────────────┤
│  Orchestrator → Planner → SQL Agent → Validator → Learning      │
└────────────────────────────────┬────────────────────────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────┐
│                        Services Layer                            │
├────────────┬────────────┬────────────┬────────────┬─────────────┤
│ LLM (Groq) │ DataSource │ Schema     │ Knowledge  │ Embeddings  │
│            │ Connectors │ Indexer    │ Base       │ (Nomic)     │
└────────────┴────────────┴────────────┴────────────┴─────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────┐
│                        Data Layer                                │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   SQLite/PG     │    ChromaDB     │   User Data Sources         │
│   (Metadata)    │    (Vectors)    │   (via Connectors)          │
└─────────────────┴─────────────────┴─────────────────────────────┘

Tech Stack

Layer Technology
Frontend Next.js 14, shadcn/ui, Tailwind, Framer Motion
Backend Python 3.12, FastAPI, Pydantic
LLM Groq API (default), OpenAI, Ollama
Vector DB ChromaDB
Embeddings nomic-embed-text (via Ollama)
Database SQLite (dev), PostgreSQL (prod)

Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 20+
  • Ollama (for local embeddings)
  • Groq API Key (free at console.groq.com)

1. Clone and Setup

cd Text2Sql

# Backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Frontend
cd ../frontend
npm install

2. Configure Environment

Create .env in the backend directory:

# LLM
LLM_PROVIDER=groq
GROQ_API_KEY=your-groq-api-key

# Embeddings (Ollama)
OLLAMA_BASE_URL=http://localhost:11434

# Database
DATABASE_URL=sqlite:///./datachat.db

# Debug
DEBUG=true

3. Start Ollama for Embeddings

# Install Ollama (macOS)
brew install ollama

# Start server
ollama serve

# Pull embedding model
ollama pull nomic-embed-text

4. Run the Application

# Terminal 1: Backend
cd backend
source venv/bin/activate
uvicorn backend.main:app --reload --port 8000

# Terminal 2: Frontend
cd frontend
npm run dev

Visit http://localhost:3000

Project Structure

text2sql/
├── backend/
│   ├── api/                # FastAPI routes
│   ├── agents/             # Agent implementations (SOLID)
│   │   ├── orchestrator.py
│   │   ├── planner.py
│   │   ├── sql_agent.py
│   │   ├── validator.py
│   │   └── learning.py
│   ├── services/           # Business logic
│   │   ├── llm/           # LLM providers
│   │   ├── datasources/   # DB connectors
│   │   ├── embeddings/    # Embedding service
│   │   ├── knowledge/     # Schema indexer
│   │   └── vector_store/  # ChromaDB
│   ├── models/            # Pydantic models
│   ├── db/                # SQLAlchemy
│   ├── core/              # Config & utils
│   └── main.py
│
├── frontend/
│   ├── app/               # Next.js pages
│   ├── components/        # React components
│   │   ├── ui/           # shadcn components
│   │   └── chat/         # Chat components
│   ├── lib/              # Utilities
│   └── hooks/            # React hooks
│
├── docker-compose.yaml
└── README.md

Usage

1. Create a Project

Projects organize your data sources and custom instructions.

2. Add Data Source

Connect to your database:

  • PostgreSQL, MySQL, SQLite
  • CSV/Parquet files
  • REST APIs

3. Start Chatting

Ask questions naturally:

  • "Show me top 10 products by revenue"
  • "What's the average order value this month?"
  • "List customers who churned last quarter"

4. Provide Feedback

Help the agent learn:

  • 👍 Correct - adds to knowledge base
  • 👎 Incorrect - learn from corrections

5. Custom Instructions

Add project-specific rules:

  • "status field uses values: active, pending, cancelled"
  • "Always filter by tenant_id"
  • "Revenue = quantity * unit_price"

API Endpoints

Endpoint Method Description
/api/projects GET List projects
/api/projects POST Create project
/api/chat/message POST Send chat message
/api/chat/sessions/{id} GET List chat sessions
/api/datasources POST Add data source
/api/datasources/{id}/sync POST Sync schema
/api/feedback POST Submit feedback

Configuration

LLM Providers

# Groq (default - fast & free)
LLM_PROVIDER=groq
GROQ_API_KEY=your-key

# OpenAI
LLM_PROVIDER=openai
OPENAI_API_KEY=your-key

# Ollama (local)
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434

Available Models (Groq)

  • llama-3.3-70b-versatile (default)
  • llama-3.1-8b-instant
  • mixtral-8x7b-32768

Development

Run Tests

cd backend
pytest

Code Formatting

# Backend
black backend/
ruff check backend/

# Frontend
npm run lint

Deployment

Docker

# Build and run
docker-compose up -d

# View logs
docker-compose logs -f

Production Checklist

  • Set DEBUG=false
  • Use PostgreSQL for metadata
  • Configure proper CORS origins
  • Set up rate limiting
  • Enable SSL/TLS
  • Configure logging
  • Set up monitoring

License

MIT

About

Dynamic context retrieval and Self-Learning Data Query System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors