This repo now contains an offline-capable ML workflow to mine patterns from:
Personal Finance Case.pdfpersonal_finance_dataset.xlsx
The pipeline does:
- XLSX ingestion without
openpyxl(parses workbook XML directly) - Feature engineering (debt, liquidity, home equity, ratios)
- Supervised net-worth modeling (
PWNETWPG) via ridge regression with CV - Unsupervised household segmentation via KMeans
- Anomaly detection via Mahalanobis distance
- Export of tables and a markdown summary report
- (Optional) create and activate a virtual environment:
python3 -m venv .venv
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txtor run:
./scripts/setup_env.sh- Run analysis:
python3 analysis/run_finance_ml.py --xlsx personal_finance_dataset.xlsx --outdir outputs --seed 42The script writes these files to outputs/:
metrics.jsonanalysis_summary.mdfactor_loadings.csvcluster_profiles.csvanomalies.csvengineered_dataset_with_cluster.csvdictionary_cleaned.csvresilience_scores.csvresilience_tier_summary.csvresilience_cluster_summary.csvresilience_scenario_summary.csvresilience_transition_matrix.csvresilience_bootstrap_stability.json
- In this environment, internet package download is blocked, so
pip installmay fail here. - The analysis code itself is written to run fully offline once required packages are available.