Skip to content

JeffMonk888/SSDC_Datathon

Repository files navigation

SDSS Datathon - Personal Finance ML Pipeline

This repo now contains an offline-capable ML workflow to mine patterns from:

  • Personal Finance Case.pdf
  • personal_finance_dataset.xlsx

The pipeline does:

  • XLSX ingestion without openpyxl (parses workbook XML directly)
  • Feature engineering (debt, liquidity, home equity, ratios)
  • Supervised net-worth modeling (PWNETWPG) via ridge regression with CV
  • Unsupervised household segmentation via KMeans
  • Anomaly detection via Mahalanobis distance
  • Export of tables and a markdown summary report

Quickstart

  1. (Optional) create and activate a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt

or run:

./scripts/setup_env.sh
  1. Run analysis:
python3 analysis/run_finance_ml.py --xlsx personal_finance_dataset.xlsx --outdir outputs --seed 42

Outputs

The script writes these files to outputs/:

  • metrics.json
  • analysis_summary.md
  • factor_loadings.csv
  • cluster_profiles.csv
  • anomalies.csv
  • engineered_dataset_with_cluster.csv
  • dictionary_cleaned.csv
  • resilience_scores.csv
  • resilience_tier_summary.csv
  • resilience_cluster_summary.csv
  • resilience_scenario_summary.csv
  • resilience_transition_matrix.csv
  • resilience_bootstrap_stability.json

Notes

  • In this environment, internet package download is blocked, so pip install may fail here.
  • The analysis code itself is written to run fully offline once required packages are available.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors