We will be undergoing planned maintenance on January 16th, 2026 at 1:00pm UTC. Please make sure to save your work.

Inspiration

Many people have data but get stuck at “what do I do next?”—especially when tools feel either too technical (Python notebooks) or too limited (basic spreadsheet filters). We wanted a single app that makes analysis conversational and visual, without needing to leave your machine.

What it does

  1. Uploads CSV/Excel datasets and previews them in a table.
  2. Generates summaries and quick statistics.
  3. Suggests and renders charts.
  4. Detects anomalies/outliers.
  5. Cleans data (missing values, formats, etc.).
  6. Trains ML models for regression/classification and shows results.
  7. Lets you ask questions in natural language and answers using RAG grounded in your uploaded dataset.

How we built it

  1. Frontend: React + Vite inside an Electron desktop shell
  2. Backend: Python FastAPI server that handles file ingest, pandas-based analysis, chart/anomaly/cleaning services, and ML training.
  3. LLM layer: Integrated local LLM via Ollama and Gemini 3 API for chat.
  4. RAG: Builds embeddings from dataset chunks and retrieves the most relevant parts for each question to ground responses.
  5. Packaging: electron-builder generates a Windows NSIS installer and bundles backend resources.

Challenges we ran into

  1. Getting “chat with data” to be reliable: session state, grounding context, and keeping responses relevant.
  2. Aligning backend response shapes with frontend rendering for training results.
  3. Handling mixed data types (numeric + categorical) for ML training without crashes.
  4. Development workflow issues like backend port conflicts during Electron dev runs.
  5. Packaging constraints: shipping a desktop app that depends on Python/LLM components cleanly on Windows.

Accomplishments that we're proud of

  1. A complete end-to-end desktop experience: upload → explore → visualize → train → chat.
  2. True RAG grounding so answers come from the dataset instead of generic responses.
  3. A modular backend (controllers/services/models) that’s easy to extend.
  4. ML results surfaced in the UI with clearer “score + accuracy-style percentage” display.

What we learned

  1. RAG quality is mostly about good chunking + retrieval, not just calling an LLM.
  2. “Works on my machine” isn’t enough for desktop: process management, ports, and packaging matter a lot.
  3. ML pipelines need preprocessing (encoding/NA handling) to be usable on real-world datasets.
  4. Tight contracts between backend JSON and frontend UI prevent silent failures.

What's next for DataSage

  1. Make the installer fully self-contained by bundling a Python runtime + dependencies(no manual setup).
  2. Improve RAG with persistent vector stores (disk-backed) and dataset versioning.
  3. Add better evaluation and explainability for models (feature importance, confusion matrix).
  4. Add dataset provenance + “analysis report export” for sharing insights.
  5. Optional cloud/edge mode: keep local-first but allow users to plug in cloud compute including Gemini for higher-quality reasoning when desired.

Built With

Share this project:

Updates