Finsults (Finance + Insults): Project Story

Bank statements, but Gen-Z-ified. We transform chaotic transactions into clear charts and concise one-liners, so students can finally see where the money actually goes.

💡 Inspiration

We kept seeing the same pattern on campus: students swipe, tap, and DoorDash their way through the month, then open their bank app and get… a wall of transactions. No narrative. No guidance. No vibes. We wanted something that feels fun, speaks in a Gen Z tone, and still delivers real insight, not just graphs and guilt.

That’s how Finsults was born: finance + insults. Get roasted. Get better.

🧪 What We Built (at a glance)

PDF → Insights: Upload bank statements; we parse PDFs, extract transactions, and auto-categorize.
Visualized Dashboard: Recharts-powered bar/pie/area charts, time filters, and translucent purple tooltips with white text.
AI Roasts: Google Gemini API turns spending patterns into punchy, actionable one-liners.

- Privacy-first: PDFs aren’t stored; we keep only anonymized transaction rows.

🏗️ How We Built It

Stack

Frontend: Next.js (App Router), TypeScript, Tailwind, shadcn/ui, Recharts.
Backend: Flask + SQLite (local persistence)
PDF & LLM: PyMuPDF for text extraction; Google Gemini API for structured JSON; deterministic parsing guardrails.

PDF → JSON Pipeline

Extract text with PyMuPDF.
Prompt LLM with a strict schema (one or many statements), requesting clean JSON only.
Parse safely: strip ```json fences, validate schema, and surface clear errors on failure.

Categorization

Canonical categories (Food & Dining, Subscriptions, Housing & Utilities, etc.) + synonym map (e.g., “DoorDash”, “groceries”, “Uber”) → deterministic labeling.
Edge cases go to Other. Money-in vs money-out is explicit.

Analytics & Charts

Filters: Last 30d, 3m, 6m, 1y (buttons wired to memoized slices).
Distribution & totals: percent-of-spend, largest merchants, and recent activity.

AI Roasts (Gen-Z-ified)

Prompts include top categories, % of spend, largest merchant, and severity.
Output bundles: line, alt_lines, action_hint, severity.
Example trigger: Food & Dining ≥ 40% → “Bestie, meet a grocery list.”

Privacy & Security

PDFs are not persisted; parsed rows only.
No client-side secrets; environment-scoped keys.
Clear UX copy: we don’t store PDFs, ever.

📚 What We Learned

Prompt design > model size for structured extraction; strict schema wins.
Defensive parsing prevents most LLM hiccups.
Micro-UX matters: gradients, tooltip contrast, and copy tone make data feel usable.
Batching LLM calls cuts cost/latency when done with careful input labeling.

🧱 Challenges We Faced

LLM JSON drift (extra commentary) → fenced JSON + schema validation.
Inconsistent PDFs (scans/encodings) → text cleanup and merchant heuristics.
Date ambiguity → normalize to ISO consistently.
SQLite vs Postgres quirks → standardized on SQLite placeholders and a thin DB helper.
Performance on long statements → batch inserts and single-call multi-file processing.

🏁 Results We’re Proud Of

Upload → charts + roasts in seconds.
Clear money-in vs money-out and top-category roast that actually nudges behavior.
A fun, non-shamey tone that users want to open again tomorrow.