Skip to content

HackUTD-2024/HackUTD2024-FinancialAssistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HackUTD Financial Analysis Chatbot

A conversational financial analysis application that intelligently interprets bank statement data and provides personalized financial insights through an AI-powered chatbot interface.

Features

  • Bank Statement Analysis: Upload and analyze CSV transaction data with automatic categorization
  • AI-Powered Chatbot: Ask natural language questions about your spending patterns, income, and financial health
  • Financial Metrics:
    • Category-based spending breakdown
    • Monthly spending trends
    • Income vs. expenses analysis
    • Credit score prediction
    • Transaction frequency analysis
    • Intelligent trend detection
  • Adaptive Responses: The chatbot classifies your questions into 18 different intent categories and provides contextual, data-backed answers

Tech Stack

  • Frontend: React 18, React Router, React Markdown (for LLM response formatting)
  • Backend: Flask (Python), Gradio (for LLM UI), Gradio Client (backend-to-LLM communication)
  • ML/AI: OpenAI API, Sambanova Gradio models (Llama 3.1)
  • Data Processing: Pandas, NumPy, scikit-learn

Quick Start

Prerequisites

  • Node.js and npm
  • Python 3.8+
  • Three terminal windows

Installation

# Install frontend dependencies
npm install

# Install backend dependencies
pip install -r requirements.txt

Running the Application

Terminal 1 - Start the Gradio LLM server:

cd backend
python app.py
# Server runs at http://127.0.0.1:7860/

Terminal 2 - Start the Flask API:

cd backend
python api.py
# API runs at http://127.0.0.1:5000

Terminal 3 - Start the React frontend:

npm start
# App opens at http://localhost:3000

All three services must be running for the chatbot to function.

How It Works

  1. CSV Upload: Provide a bank statement CSV with transaction data
  2. Question: Ask the chatbot about your finances in natural language
  3. Intelligent Classification: The system classifies your question into 18 intent categories
  4. Context Enrichment: Your transaction data and pre-computed statistics are added to the prompt
  5. AI Response: Llama 3.1 generates a contextual, fact-based answer with data citations

CSV Format

Required columns in your bank statement CSV:

  • Transaction Date (format: YYYY-MM-DD or MM-DD-YYYY)
  • Post Date (format: MM/DD/YYYY)
  • Amount (numeric: positive for income, negative for expenses)
  • Category (string: e.g., "Shopping", "Food & Drink", "Travel")

Project Structure

.
├── src/                    # React frontend
│   ├── pages/Home.js      # Main dashboard
│   └── components/        # React components
│       └── Chatbot.js     # Chat interface
├── backend/               # Flask & Python
│   ├── api.py            # Flask API endpoints
│   ├── app.py            # Gradio LLM server
│   ├── categories.py     # Financial analysis functions
│   ├── process_file.py   # CSV data processing
│   └── test bank statement data.csv
└── package.json          # Frontend dependencies

Developer Guide

Architecture & Data Flow

Core Components

  1. Frontend (src/):

    • App.js: Single-page app with React Router (currently one route: "/")
    • pages/Home.js: Main dashboard with hardcoded analysis data and card-based UI
    • components/Chatbot.js: Chat UI that fetches from /api/ask endpoint
  2. Backend (backend/):

    • api.py: Flask server (port 5000, CORS enabled for localhost:3000)
      • /api/analyze - Returns pre-calculated financial metrics
      • /api/ask - Routes user messages through LLM with prompt engineering
    • categories.py: Financial analysis functions (calculate spending by category, monthly trends, credit score prediction, etc.)
    • process_file.py: CSV data loading and statistical preprocessing
    • app.py: Gradio app loader for Llama 3.1 model (runs on port 7860)

Data Flow

User Question → Frontend → POST to api.py:/api/askClassify prompt type (via Gradio) → Apply prompt engineering with context → Query Gradio LLM → Response → Frontend renders as markdown

CSV Data Flow: CSV file (hardcoded paths in api.py, categories.py) → Pandas DataFrame → Analysis functions → Returned as JSON or embedded in LLM context

Critical Hardcoded Paths

  • CSV path in api.py: /Users/rchittineni/Repos/Project/hackutd-project/backend/test bank statement data.csv
  • CSV path in categories.py: Same
  • CSV processing in api.py:/api/ask: Uses Windows path C:/Users/abhir/... (mismatch!)

⚠️ Future Fix: Parameterize file paths or use relative paths from environment variables.

Project-Specific Patterns & Conventions

Prompt Engineering Pattern

The /api/ask endpoint uses a sophisticated multi-stage prompt approach:

  1. Classify user intent into one of 18 categories (e.g., ExplainMyData, GoalIncreaseSavings, FindTrendsInData)
  2. Apply category-specific instructions that guide the LLM response style
  3. Embed statistical context (pre-calculated stats from process_file.py)
  4. Append full data as CSV string (entire transaction table)
  5. Add strict formatting instructions (2-3 sentence responses, markdown, no meta-instructions)

Example categories in api.py lines 60-90:

  • ExplainMyData: "Refer specifically to the statistics and data below. Only answer from them."
  • GoalIncreaseSavings: "Provide practical strategies...based on their data. Cite it too."
  • LookupSpecificInfoInMyData: "You are now a calculator. Exactly look over the data."

Financial Analysis Functions

Located in categories.py, these compute:

  • Category-based spending: Sum transactions by merchant category
  • Monthly trends: Spending patterns by month and category
  • Income vs Expenses: Total income (positive amounts), expenses (negative), net balance
  • Credit score prediction: Simplified formula based on payment activity and expense ratio
  • Transaction frequency: Count transactions per category

Important: These functions assume CSV columns: Category, Amount, Post Date (MM/DD/YYYY format), Transaction Date

Frontend Data Structure

Home.js uses hardcoded analysisData object with pre-computed values. This is not connected to the backend /api/analyze endpoint — the data is static. When implementing dynamic features, fetch from /api/analyze and parse the response structure.

Integration Points & Dependencies

Gradio Client Integration (api.py)

  • Port: 7860 (must match app.py launch)
  • API Endpoint: /chat (called by gradio_client.predict())
  • Message Format: String prompt (can be very long with embedded data)
  • Used for:
    • Prompt classification (18-category system)
    • LLM response generation

React-Flask Integration

  • CORS: Enabled only for http://localhost:3000
  • Base URL: http://127.0.0.1:5000 (hardcoded in Chatbot.js line 14)
  • Response format: JSON with field response (LLM text)

External APIs

  • OpenAI: Imported in requirements.txt but not explicitly used in current code (may be legacy or planned)
  • Sambanova: Used via Gradio registry for Llama 3.1 model

Testing & Debugging

Testing the Chat Endpoint

curl -X POST http://localhost:5000/api/ask \
  -H "Content-Type: application/json" \
  -d '{"message": "How am I spending money?"}'

Common Issues & Solutions

Issue Cause Solution
"Connection refused" on /api/ask Gradio server not running Start python app.py in backend/
CORS error in browser console Flask server not at http://127.0.0.1:5000 Check Flask is running, verify hardcoded URL in Chatbot.js
CSV file not found Hardcoded path mismatches OS or user Update paths in api.py:22, categories.py:5, api.py:127
Prompt classification fails silently Gradio /chat endpoint not responsive Check model is loaded in Gradio app

Adding Features

  1. New financial metrics: Add function to categories.py, call from /api/analyze, update Frontend response parsing
  2. New chat categories: Add to classify_Prompt_Type() categories list, define corresponding instruction string
  3. CSV schema changes: Update column references in process_file.py, categories.py, and test data
  4. Frontend pages: Add route to App.js, create component in src/pages/, update navigation
  5. New backend endpoints: Add @app.route() to api.py, ensure CORS origin list updated if needed

Future Refactoring

  • Remove hardcoded paths: Use environment variables (os.getenv()) or a config file
  • Separate concerns: Move CSV loading logic to a dedicated module imported by both api.py and categories.py
  • Dynamic frontend data: Replace hardcoded analysisData in Home.js with state that fetches from /api/analyze
  • Error handling: Add try-catch for Gradio client timeouts; improve error messages
  • Testing: Add unit tests for categories.py functions and integration tests for Flask endpoints

Known Issues

  • CSV file paths are hardcoded in api.py and categories.py—update these paths to match your system
  • Windows and macOS paths differ; currently has both user-specific paths
  • Frontend uses hardcoded analysis data instead of fetching from backend endpoint

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors