An intelligent chatbot for analyzing Excel/CSV data using natural language queries. Runs 100% locally using Ollama LLMs and local embeddings - no external API calls.
- π Fully Local: Uses Ollama (local LLM) and sentence-transformers (local embeddings)
- π Any Excel/CSV File: Load and analyze any data file
- π¬ Multiple Conversations: Support for multiple simultaneous conversations with isolated data
- π§ Smart Routing: Automatically routes questions to the best analysis method
- π Column Explanation: Properly explains all columns and data structure
- π Trend Analysis: Analyze patterns over time
- π Hybrid Analysis: Combines numerical + text analysis for comprehensive insights
- π Automatic Retry: 3-attempt retry mechanism for robust error handling
- πΎ Local Storage: Conversations persist across browser sessions
- π‘οΈ Error Resilience: Comprehensive error handling with user-friendly messages
- β Real Analysis: Ensures genuine data-driven insights, not generic responses
- π API Ready: Clean API layer for web/app integration
| Query Type | Handler | Example Questions |
|---|---|---|
| Schema | SchemaHandler | "What columns are available?", "Explain all columns" |
| SQL | SQLHandler | "How many records?", "Average rating?", "Top 10 by score" |
| Cluster | ClusterHandler | "Most common complaints?", "Main issues?" |
| RAG | RAGHandler | "Why are customers unhappy?", "Explain the issues" |
| Trend | TrendAnalyzer | "Show the trend over time", "Monthly patterns" |
| Overview | HybridHandler | "Give me an overview", "Summarize the data" |
| Hybrid | HybridHandler | "Analyze discount trends", "Insights on ratings" |
excel_analytics_chatbot/
βββ core/ # Core modules
β βββ config.py # Configuration (local models)
β βββ llm.py # Ollama LLM interface
β βββ embeddings.py # Local sentence embeddings
β βββ database.py # DuckDB operations
βββ handlers/ # Question handlers
β βββ sql_handler.py # SQL queries
β βββ cluster_handler.py # Text clustering
β βββ rag_handler.py # Semantic search
β βββ schema_handler.py # Column explanations β
β βββ hybrid_handler.py # Combined analysis β
β βββ trend_analyzer.py # Time-based trends
β βββ data_profiler.py # Data profiling
βββ api/ # API layer β
β βββ __init__.py
β βββ chatbot_api.py # JSON API for web/app
βββ utils/
β βββ vector_store.py # FAISS vector store
βββ app.py # Main application
βββ question_router.py # Smart routing
βββ requirements.txt
pip install -r requirements.txt-
Ollama running locally:
# Install Ollama from https://ollama.ai ollama pull llama3:8b ollama serve -
Dependencies (auto-installed):
- sentence-transformers (local embeddings)
- duckdb (database)
- faiss-cpu (vector search)
- scikit-learn (clustering)
python app.pyCommands:
load <path>- Load a CSV or Excel filehelp- Show available commandsschema- Quick column overviewcolumns- Detailed column explanationstables- List loaded tablesquit- Exit
π§ You: What columns are available?
π€ Bot: [Lists all columns with types and descriptions]
π§ You: Explain all the columns
π€ Bot: [Detailed explanation of each column]
π§ You: How many records are there?
π€ Bot: [SQL-based count with insights]
π§ You: Give me an overview of the data
π€ Bot: [Comprehensive dataset overview]
π§ You: What are the most common complaints?
π€ Bot: [Clustering analysis of text patterns]
π§ You: Give me insights on the discount trends
π€ Bot: [Hybrid analysis combining numerical and text data]
from app import ExcelAnalyticsChatbot
# Initialize
chatbot = ExcelAnalyticsChatbot()
# Load any CSV or Excel file
chatbot.load_data("path/to/your/data.csv")
# Ask questions
print(chatbot.ask("What columns are available?")) # Schema
print(chatbot.ask("How many records are there?")) # SQL
print(chatbot.ask("What are the most common issues?")) # Cluster
print(chatbot.ask("Why are customers unhappy?")) # RAG
print(chatbot.ask("Show the trend over time")) # Trend
print(chatbot.ask("Give me an overview")) # Overview
# Close
chatbot.close()from api import ChatbotAPI, create_api
# Quick setup
api = create_api("path/to/data.csv")
# Or manual setup
api = ChatbotAPI()
result = api.load_file("path/to/data.csv")
# Ask questions (returns JSON-like response)
response = api.ask("What columns are available?")
print(response.data["answer"])
# Get schema
schema = api.get_schema(detailed=True)
# Get overview
overview = api.get_overview()
# Execute SQL directly
result = api.execute_sql("SELECT * FROM data LIMIT 10")
api.close()Environment variables:
OLLAMA_URL: Ollama API URL (default:http://localhost:11434/api/chat)LLM_MODEL: LLM model (default:llama3:8b)EMBEDDING_MODEL: Embedding model (default:all-MiniLM-L6-v2)DB_PATH: Database path (default:data/analytics.duckdb)
- Load Data: CSV/Excel β DuckDB table (isolated per conversation)
- Route Question: Smart router analyzes question type
- Select Handler: Routes to appropriate handler (SQL, Cluster, RAG, Schema, Trend, Hybrid)
- Process: Handler retrieves/analyzes data with automatic retry (up to 3 attempts)
- Validate: Ensures response contains real data, not generic statements
- Generate Response: LLM formats results into natural language
- Persist: Conversation saved to localStorage for future sessions
- Each conversation has its own isolated database instance
- Upload multiple Excel files across different conversations
- Switch between conversations without data conflicts
- All conversations persist in localStorage
- 3-Attempt Retry: Automatically retries failed operations up to 3 times
- User-Friendly Messages: No technical errors shown to users
- Graceful Degradation: Provides helpful alternatives when operations fail
- Real Analysis Validation: Detects and retries generic/fake responses
- Each conversation uses a separate database file:
data/sessions/session_{id}.duckdb - Uploaded files stored in session-specific folders:
uploads/{session_id}/ - No data leakage between conversations
- LLM: Ollama with llama3:8b (or any compatible model)
- Alternatives:
mistral:7b,llama2:7b,codellama:7b
- Alternatives:
- Embeddings: sentence-transformers
all-MiniLM-L6-v2- Alternatives:
all-mpnet-base-v2(higher quality)
- Alternatives:
"Cannot connect to Ollama"
# Make sure Ollama is running
ollama serve"Model not found"
# Pull the model
ollama pull llama3:8bSlow responses
- Use a smaller model:
mistral:7b - Reduce
RAG_TOP_Kin config.py