Skip to content

HeC-KriS/excel_analytics_chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Excel Analytics Chatbot

An intelligent chatbot for analyzing Excel/CSV data using natural language queries. Runs 100% locally using Ollama LLMs and local embeddings - no external API calls.

Features

  • πŸ”’ Fully Local: Uses Ollama (local LLM) and sentence-transformers (local embeddings)
  • πŸ“Š Any Excel/CSV File: Load and analyze any data file
  • πŸ’¬ Multiple Conversations: Support for multiple simultaneous conversations with isolated data
  • 🧠 Smart Routing: Automatically routes questions to the best analysis method
  • πŸ“ Column Explanation: Properly explains all columns and data structure
  • πŸ“ˆ Trend Analysis: Analyze patterns over time
  • πŸ” Hybrid Analysis: Combines numerical + text analysis for comprehensive insights
  • πŸ”„ Automatic Retry: 3-attempt retry mechanism for robust error handling
  • πŸ’Ύ Local Storage: Conversations persist across browser sessions
  • πŸ›‘οΈ Error Resilience: Comprehensive error handling with user-friendly messages
  • βœ… Real Analysis: Ensures genuine data-driven insights, not generic responses
  • 🌐 API Ready: Clean API layer for web/app integration

Query Types

Query Type Handler Example Questions
Schema SchemaHandler "What columns are available?", "Explain all columns"
SQL SQLHandler "How many records?", "Average rating?", "Top 10 by score"
Cluster ClusterHandler "Most common complaints?", "Main issues?"
RAG RAGHandler "Why are customers unhappy?", "Explain the issues"
Trend TrendAnalyzer "Show the trend over time", "Monthly patterns"
Overview HybridHandler "Give me an overview", "Summarize the data"
Hybrid HybridHandler "Analyze discount trends", "Insights on ratings"

Project Structure

excel_analytics_chatbot/
β”œβ”€β”€ core/                    # Core modules
β”‚   β”œβ”€β”€ config.py           # Configuration (local models)
β”‚   β”œβ”€β”€ llm.py              # Ollama LLM interface
β”‚   β”œβ”€β”€ embeddings.py       # Local sentence embeddings
β”‚   └── database.py         # DuckDB operations
β”œβ”€β”€ handlers/               # Question handlers
β”‚   β”œβ”€β”€ sql_handler.py      # SQL queries
β”‚   β”œβ”€β”€ cluster_handler.py  # Text clustering
β”‚   β”œβ”€β”€ rag_handler.py      # Semantic search
β”‚   β”œβ”€β”€ schema_handler.py   # Column explanations ⭐
β”‚   β”œβ”€β”€ hybrid_handler.py   # Combined analysis ⭐
β”‚   β”œβ”€β”€ trend_analyzer.py   # Time-based trends
β”‚   └── data_profiler.py    # Data profiling
β”œβ”€β”€ api/                    # API layer ⭐
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── chatbot_api.py      # JSON API for web/app
β”œβ”€β”€ utils/
β”‚   └── vector_store.py     # FAISS vector store
β”œβ”€β”€ app.py                  # Main application
β”œβ”€β”€ question_router.py      # Smart routing
└── requirements.txt

Installation

pip install -r requirements.txt

Prerequisites

  1. Ollama running locally:

    # Install Ollama from https://ollama.ai
    ollama pull llama3:8b
    ollama serve
  2. Dependencies (auto-installed):

    • sentence-transformers (local embeddings)
    • duckdb (database)
    • faiss-cpu (vector search)
    • scikit-learn (clustering)

Usage

Interactive Mode

python app.py

Commands:

  • load <path> - Load a CSV or Excel file
  • help - Show available commands
  • schema - Quick column overview
  • columns - Detailed column explanations
  • tables - List loaded tables
  • quit - Exit

Example Questions

πŸ§‘ You: What columns are available?
πŸ€– Bot: [Lists all columns with types and descriptions]

πŸ§‘ You: Explain all the columns
πŸ€– Bot: [Detailed explanation of each column]

πŸ§‘ You: How many records are there?
πŸ€– Bot: [SQL-based count with insights]

πŸ§‘ You: Give me an overview of the data
πŸ€– Bot: [Comprehensive dataset overview]

πŸ§‘ You: What are the most common complaints?
πŸ€– Bot: [Clustering analysis of text patterns]

πŸ§‘ You: Give me insights on the discount trends
πŸ€– Bot: [Hybrid analysis combining numerical and text data]

Programmatic Usage

from app import ExcelAnalyticsChatbot

# Initialize
chatbot = ExcelAnalyticsChatbot()

# Load any CSV or Excel file
chatbot.load_data("path/to/your/data.csv")

# Ask questions
print(chatbot.ask("What columns are available?"))      # Schema
print(chatbot.ask("How many records are there?"))       # SQL
print(chatbot.ask("What are the most common issues?"))  # Cluster
print(chatbot.ask("Why are customers unhappy?"))        # RAG
print(chatbot.ask("Show the trend over time"))          # Trend
print(chatbot.ask("Give me an overview"))               # Overview

# Close
chatbot.close()

API Usage (for Web/App Integration)

from api import ChatbotAPI, create_api

# Quick setup
api = create_api("path/to/data.csv")

# Or manual setup
api = ChatbotAPI()
result = api.load_file("path/to/data.csv")

# Ask questions (returns JSON-like response)
response = api.ask("What columns are available?")
print(response.data["answer"])

# Get schema
schema = api.get_schema(detailed=True)

# Get overview
overview = api.get_overview()

# Execute SQL directly
result = api.execute_sql("SELECT * FROM data LIMIT 10")

api.close()

Configuration

Environment variables:

  • OLLAMA_URL: Ollama API URL (default: http://localhost:11434/api/chat)
  • LLM_MODEL: LLM model (default: llama3:8b)
  • EMBEDDING_MODEL: Embedding model (default: all-MiniLM-L6-v2)
  • DB_PATH: Database path (default: data/analytics.duckdb)

How It Works

  1. Load Data: CSV/Excel β†’ DuckDB table (isolated per conversation)
  2. Route Question: Smart router analyzes question type
  3. Select Handler: Routes to appropriate handler (SQL, Cluster, RAG, Schema, Trend, Hybrid)
  4. Process: Handler retrieves/analyzes data with automatic retry (up to 3 attempts)
  5. Validate: Ensures response contains real data, not generic statements
  6. Generate Response: LLM formats results into natural language
  7. Persist: Conversation saved to localStorage for future sessions

New Features

Multi-Conversation Support

  • Each conversation has its own isolated database instance
  • Upload multiple Excel files across different conversations
  • Switch between conversations without data conflicts
  • All conversations persist in localStorage

Robust Error Handling

  • 3-Attempt Retry: Automatically retries failed operations up to 3 times
  • User-Friendly Messages: No technical errors shown to users
  • Graceful Degradation: Provides helpful alternatives when operations fail
  • Real Analysis Validation: Detects and retries generic/fake responses

Data Isolation

  • Each conversation uses a separate database file: data/sessions/session_{id}.duckdb
  • Uploaded files stored in session-specific folders: uploads/{session_id}/
  • No data leakage between conversations

Local Model Requirements

  • LLM: Ollama with llama3:8b (or any compatible model)
    • Alternatives: mistral:7b, llama2:7b, codellama:7b
  • Embeddings: sentence-transformers all-MiniLM-L6-v2
    • Alternatives: all-mpnet-base-v2 (higher quality)

Troubleshooting

"Cannot connect to Ollama"

# Make sure Ollama is running
ollama serve

"Model not found"

# Pull the model
ollama pull llama3:8b

Slow responses

  • Use a smaller model: mistral:7b
  • Reduce RAG_TOP_K in config.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors