HackUTD Financial Analysis Chatbot

A conversational financial analysis application that intelligently interprets bank statement data and provides personalized financial insights through an AI-powered chatbot interface.

Features

Bank Statement Analysis: Upload and analyze CSV transaction data with automatic categorization
AI-Powered Chatbot: Ask natural language questions about your spending patterns, income, and financial health
Financial Metrics:
- Category-based spending breakdown
- Monthly spending trends
- Income vs. expenses analysis
- Credit score prediction
- Transaction frequency analysis
- Intelligent trend detection
Adaptive Responses: The chatbot classifies your questions into 18 different intent categories and provides contextual, data-backed answers

Tech Stack

Frontend: React 18, React Router, React Markdown (for LLM response formatting)
Backend: Flask (Python), Gradio (for LLM UI), Gradio Client (backend-to-LLM communication)
ML/AI: OpenAI API, Sambanova Gradio models (Llama 3.1)
Data Processing: Pandas, NumPy, scikit-learn

Quick Start

Prerequisites

Node.js and npm
Python 3.8+
Three terminal windows

Installation

# Install frontend dependencies
npm install

# Install backend dependencies
pip install -r requirements.txt

Running the Application

Terminal 1 - Start the Gradio LLM server:

cd backend
python app.py
# Server runs at http://127.0.0.1:7860/

Terminal 2 - Start the Flask API:

cd backend
python api.py
# API runs at http://127.0.0.1:5000

Terminal 3 - Start the React frontend:

npm start
# App opens at http://localhost:3000

All three services must be running for the chatbot to function.

How It Works

CSV Upload: Provide a bank statement CSV with transaction data
Question: Ask the chatbot about your finances in natural language
Intelligent Classification: The system classifies your question into 18 intent categories
Context Enrichment: Your transaction data and pre-computed statistics are added to the prompt
AI Response: Llama 3.1 generates a contextual, fact-based answer with data citations

CSV Format

Required columns in your bank statement CSV:

Transaction Date (format: YYYY-MM-DD or MM-DD-YYYY)
Post Date (format: MM/DD/YYYY)
Amount (numeric: positive for income, negative for expenses)
Category (string: e.g., "Shopping", "Food & Drink", "Travel")

Project Structure

.
├── src/                    # React frontend
│   ├── pages/Home.js      # Main dashboard
│   └── components/        # React components
│       └── Chatbot.js     # Chat interface
├── backend/               # Flask & Python
│   ├── api.py            # Flask API endpoints
│   ├── app.py            # Gradio LLM server
│   ├── categories.py     # Financial analysis functions
│   ├── process_file.py   # CSV data processing
│   └── test bank statement data.csv
└── package.json          # Frontend dependencies

Developer Guide

Architecture & Data Flow

Core Components

Frontend (src/):
- App.js: Single-page app with React Router (currently one route: "/")
- pages/Home.js: Main dashboard with hardcoded analysis data and card-based UI
- components/Chatbot.js: Chat UI that fetches from /api/ask endpoint
Backend (backend/):
- api.py: Flask server (port 5000, CORS enabled for localhost:3000)
  - /api/analyze - Returns pre-calculated financial metrics
  - /api/ask - Routes user messages through LLM with prompt engineering
- categories.py: Financial analysis functions (calculate spending by category, monthly trends, credit score prediction, etc.)
- process_file.py: CSV data loading and statistical preprocessing
- app.py: Gradio app loader for Llama 3.1 model (runs on port 7860)

Data Flow

User Question → Frontend → POST to api.py:/api/ask → Classify prompt type (via Gradio) → Apply prompt engineering with context → Query Gradio LLM → Response → Frontend renders as markdown

CSV Data Flow: CSV file (hardcoded paths in api.py, categories.py) → Pandas DataFrame → Analysis functions → Returned as JSON or embedded in LLM context

Critical Hardcoded Paths

CSV path in api.py: /Users/rchittineni/Repos/Project/hackutd-project/backend/test bank statement data.csv
CSV path in categories.py: Same
CSV processing in api.py:/api/ask: Uses Windows path C:/Users/abhir/... (mismatch!)

⚠️ Future Fix: Parameterize file paths or use relative paths from environment variables.

Project-Specific Patterns & Conventions

Prompt Engineering Pattern

The /api/ask endpoint uses a sophisticated multi-stage prompt approach:

Classify user intent into one of 18 categories (e.g., ExplainMyData, GoalIncreaseSavings, FindTrendsInData)
Apply category-specific instructions that guide the LLM response style
Embed statistical context (pre-calculated stats from process_file.py)
Append full data as CSV string (entire transaction table)
Add strict formatting instructions (2-3 sentence responses, markdown, no meta-instructions)

Example categories in api.py lines 60-90:

ExplainMyData: "Refer specifically to the statistics and data below. Only answer from them."
GoalIncreaseSavings: "Provide practical strategies...based on their data. Cite it too."
LookupSpecificInfoInMyData: "You are now a calculator. Exactly look over the data."

Financial Analysis Functions

Located in categories.py, these compute:

Category-based spending: Sum transactions by merchant category
Monthly trends: Spending patterns by month and category
Income vs Expenses: Total income (positive amounts), expenses (negative), net balance
Credit score prediction: Simplified formula based on payment activity and expense ratio
Transaction frequency: Count transactions per category

Important: These functions assume CSV columns: Category, Amount, Post Date (MM/DD/YYYY format), Transaction Date

Frontend Data Structure

Home.js uses hardcoded analysisData object with pre-computed values. This is not connected to the backend /api/analyze endpoint — the data is static. When implementing dynamic features, fetch from /api/analyze and parse the response structure.

Integration Points & Dependencies

Gradio Client Integration (`api.py`)

Port: 7860 (must match app.py launch)
API Endpoint: /chat (called by gradio_client.predict())
Message Format: String prompt (can be very long with embedded data)
Used for:
- Prompt classification (18-category system)
- LLM response generation

React-Flask Integration

CORS: Enabled only for http://localhost:3000
Base URL: http://127.0.0.1:5000 (hardcoded in Chatbot.js line 14)
Response format: JSON with field response (LLM text)

External APIs

OpenAI: Imported in requirements.txt but not explicitly used in current code (may be legacy or planned)
Sambanova: Used via Gradio registry for Llama 3.1 model

Testing & Debugging

Testing the Chat Endpoint

curl -X POST http://localhost:5000/api/ask \
  -H "Content-Type: application/json" \
  -d '{"message": "How am I spending money?"}'

Common Issues & Solutions

Issue	Cause	Solution
"Connection refused" on `/api/ask`	Gradio server not running	Start `python app.py` in backend/
CORS error in browser console	Flask server not at `http://127.0.0.1:5000`	Check Flask is running, verify hardcoded URL in Chatbot.js
CSV file not found	Hardcoded path mismatches OS or user	Update paths in `api.py:22`, `categories.py:5`, `api.py:127`
Prompt classification fails silently	Gradio `/chat` endpoint not responsive	Check model is loaded in Gradio app

Adding Features

New financial metrics: Add function to categories.py, call from /api/analyze, update Frontend response parsing
New chat categories: Add to classify_Prompt_Type() categories list, define corresponding instruction string
CSV schema changes: Update column references in process_file.py, categories.py, and test data
Frontend pages: Add route to App.js, create component in src/pages/, update navigation
New backend endpoints: Add @app.route() to api.py, ensure CORS origin list updated if needed

Future Refactoring

Remove hardcoded paths: Use environment variables (os.getenv()) or a config file
Separate concerns: Move CSV loading logic to a dedicated module imported by both api.py and categories.py
Dynamic frontend data: Replace hardcoded analysisData in Home.js with state that fetches from /api/analyze
Error handling: Add try-catch for Gradio client timeouts; improve error messages
Testing: Add unit tests for categories.py functions and integration tests for Flask endpoints

Known Issues

CSV file paths are hardcoded in api.py and categories.py—update these paths to match your system
Windows and macOS paths differ; currently has both user-specific paths
Frontend uses hardcoded analysis data instead of fetching from backend endpoint

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
backend		backend
public		public
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HackUTD Financial Analysis Chatbot

Features

Tech Stack

Quick Start

Prerequisites

Installation

Running the Application

How It Works

CSV Format

Project Structure

Developer Guide

Architecture & Data Flow

Core Components

Data Flow

Critical Hardcoded Paths

Project-Specific Patterns & Conventions

Prompt Engineering Pattern

Financial Analysis Functions

Frontend Data Structure

Integration Points & Dependencies

Gradio Client Integration (api.py)

React-Flask Integration

External APIs

Testing & Debugging

Testing the Chat Endpoint

Common Issues & Solutions

Adding Features

Future Refactoring

Known Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Gradio Client Integration (`api.py`)

Packages