A conversational financial analysis application that intelligently interprets bank statement data and provides personalized financial insights through an AI-powered chatbot interface.
- Bank Statement Analysis: Upload and analyze CSV transaction data with automatic categorization
- AI-Powered Chatbot: Ask natural language questions about your spending patterns, income, and financial health
- Financial Metrics:
- Category-based spending breakdown
- Monthly spending trends
- Income vs. expenses analysis
- Credit score prediction
- Transaction frequency analysis
- Intelligent trend detection
- Adaptive Responses: The chatbot classifies your questions into 18 different intent categories and provides contextual, data-backed answers
- Frontend: React 18, React Router, React Markdown (for LLM response formatting)
- Backend: Flask (Python), Gradio (for LLM UI), Gradio Client (backend-to-LLM communication)
- ML/AI: OpenAI API, Sambanova Gradio models (Llama 3.1)
- Data Processing: Pandas, NumPy, scikit-learn
- Node.js and npm
- Python 3.8+
- Three terminal windows
# Install frontend dependencies
npm install
# Install backend dependencies
pip install -r requirements.txtTerminal 1 - Start the Gradio LLM server:
cd backend
python app.py
# Server runs at http://127.0.0.1:7860/Terminal 2 - Start the Flask API:
cd backend
python api.py
# API runs at http://127.0.0.1:5000Terminal 3 - Start the React frontend:
npm start
# App opens at http://localhost:3000All three services must be running for the chatbot to function.
- CSV Upload: Provide a bank statement CSV with transaction data
- Question: Ask the chatbot about your finances in natural language
- Intelligent Classification: The system classifies your question into 18 intent categories
- Context Enrichment: Your transaction data and pre-computed statistics are added to the prompt
- AI Response: Llama 3.1 generates a contextual, fact-based answer with data citations
Required columns in your bank statement CSV:
Transaction Date(format: YYYY-MM-DD or MM-DD-YYYY)Post Date(format: MM/DD/YYYY)Amount(numeric: positive for income, negative for expenses)Category(string: e.g., "Shopping", "Food & Drink", "Travel")
.
├── src/ # React frontend
│ ├── pages/Home.js # Main dashboard
│ └── components/ # React components
│ └── Chatbot.js # Chat interface
├── backend/ # Flask & Python
│ ├── api.py # Flask API endpoints
│ ├── app.py # Gradio LLM server
│ ├── categories.py # Financial analysis functions
│ ├── process_file.py # CSV data processing
│ └── test bank statement data.csv
└── package.json # Frontend dependencies
-
Frontend (
src/):App.js: Single-page app with React Router (currently one route: "/")pages/Home.js: Main dashboard with hardcoded analysis data and card-based UIcomponents/Chatbot.js: Chat UI that fetches from/api/askendpoint
-
Backend (
backend/):api.py: Flask server (port 5000, CORS enabled for localhost:3000)/api/analyze- Returns pre-calculated financial metrics/api/ask- Routes user messages through LLM with prompt engineering
categories.py: Financial analysis functions (calculate spending by category, monthly trends, credit score prediction, etc.)process_file.py: CSV data loading and statistical preprocessingapp.py: Gradio app loader for Llama 3.1 model (runs on port 7860)
User Question → Frontend → POST to api.py:/api/ask → Classify prompt type (via Gradio) → Apply prompt engineering with context → Query Gradio LLM → Response → Frontend renders as markdown
CSV Data Flow: CSV file (hardcoded paths in api.py, categories.py) → Pandas DataFrame → Analysis functions → Returned as JSON or embedded in LLM context
- CSV path in
api.py:/Users/rchittineni/Repos/Project/hackutd-project/backend/test bank statement data.csv - CSV path in
categories.py: Same - CSV processing in
api.py:/api/ask: Uses Windows pathC:/Users/abhir/...(mismatch!)
The /api/ask endpoint uses a sophisticated multi-stage prompt approach:
- Classify user intent into one of 18 categories (e.g.,
ExplainMyData,GoalIncreaseSavings,FindTrendsInData) - Apply category-specific instructions that guide the LLM response style
- Embed statistical context (pre-calculated stats from
process_file.py) - Append full data as CSV string (entire transaction table)
- Add strict formatting instructions (2-3 sentence responses, markdown, no meta-instructions)
Example categories in api.py lines 60-90:
ExplainMyData: "Refer specifically to the statistics and data below. Only answer from them."GoalIncreaseSavings: "Provide practical strategies...based on their data. Cite it too."LookupSpecificInfoInMyData: "You are now a calculator. Exactly look over the data."
Located in categories.py, these compute:
- Category-based spending: Sum transactions by merchant category
- Monthly trends: Spending patterns by month and category
- Income vs Expenses: Total income (positive amounts), expenses (negative), net balance
- Credit score prediction: Simplified formula based on payment activity and expense ratio
- Transaction frequency: Count transactions per category
Important: These functions assume CSV columns: Category, Amount, Post Date (MM/DD/YYYY format), Transaction Date
Home.js uses hardcoded analysisData object with pre-computed values. This is not connected to the backend /api/analyze endpoint — the data is static. When implementing dynamic features, fetch from /api/analyze and parse the response structure.
- Port: 7860 (must match
app.pylaunch) - API Endpoint:
/chat(called bygradio_client.predict()) - Message Format: String prompt (can be very long with embedded data)
- Used for:
- Prompt classification (18-category system)
- LLM response generation
- CORS: Enabled only for
http://localhost:3000 - Base URL:
http://127.0.0.1:5000(hardcoded inChatbot.jsline 14) - Response format: JSON with field
response(LLM text)
- OpenAI: Imported in
requirements.txtbut not explicitly used in current code (may be legacy or planned) - Sambanova: Used via Gradio registry for Llama 3.1 model
curl -X POST http://localhost:5000/api/ask \
-H "Content-Type: application/json" \
-d '{"message": "How am I spending money?"}'| Issue | Cause | Solution |
|---|---|---|
"Connection refused" on /api/ask |
Gradio server not running | Start python app.py in backend/ |
| CORS error in browser console | Flask server not at http://127.0.0.1:5000 |
Check Flask is running, verify hardcoded URL in Chatbot.js |
| CSV file not found | Hardcoded path mismatches OS or user | Update paths in api.py:22, categories.py:5, api.py:127 |
| Prompt classification fails silently | Gradio /chat endpoint not responsive |
Check model is loaded in Gradio app |
- New financial metrics: Add function to
categories.py, call from/api/analyze, update Frontend response parsing - New chat categories: Add to
classify_Prompt_Type()categories list, define corresponding instruction string - CSV schema changes: Update column references in
process_file.py,categories.py, and test data - Frontend pages: Add route to
App.js, create component insrc/pages/, update navigation - New backend endpoints: Add
@app.route()toapi.py, ensure CORS origin list updated if needed
- Remove hardcoded paths: Use environment variables (
os.getenv()) or a config file - Separate concerns: Move CSV loading logic to a dedicated module imported by both
api.pyandcategories.py - Dynamic frontend data: Replace hardcoded
analysisDatainHome.jswith state that fetches from/api/analyze - Error handling: Add try-catch for Gradio client timeouts; improve error messages
- Testing: Add unit tests for
categories.pyfunctions and integration tests for Flask endpoints
- CSV file paths are hardcoded in
api.pyandcategories.py—update these paths to match your system - Windows and macOS paths differ; currently has both user-specific paths
- Frontend uses hardcoded analysis data instead of fetching from backend endpoint