A production-minded fraud-risk workflow for detecting suspicious transactions with cost-sensitive thresholding, validation, explainability, reason codes, dashboard review, and threshold policy artifacts.
Important: This project is a portfolio and research demo, not a production fraud detection system.
The data is synthetic. The model, thresholds, and reason codes are designed to demonstrate a professional fraud-risk workflow, not to make real financial decisions without expert validation, monitoring, compliance review, and security controls.
- Project Overview
- What This Project Does
- What This Project Does Not Do
- Key Features
- System Workflow
- Project Structure
- Installation
- Quick Start
- Synthetic Data Generator
- Training and Evaluation
- Threshold Policy Artifacts
- Batch Scoring
- Streamlit Dashboard
- Explainability and Reason Codes
- Evaluation Metrics
- Visual Reports
- Testing and CI
- Code Quality
- Limitations
- Responsible Use
- Future Improvements
- Tech Stack
- Author
- License
Fraud detection is not only a classification problem. Real fraud systems require careful handling of:
- class imbalance
- changing fraud patterns
- false-positive cost
- missed-fraud cost
- human review capacity
- model interpretability
- batch scoring and triage
- validation and monitoring
This project demonstrates an end-to-end fraud-risk workflow using synthetic transaction data. It includes data preparation, model training, cost-sensitive evaluation, threshold search, policy artifacts, scoring, dashboard review, SHAP explainability, and analyst-friendly reason codes.
The goal is to show how a fraud-risk model can be turned into a decision-support system, not just a metric on a notebook.
This project can:
- Generate a harder synthetic fraud dataset with overlap and label noise
- Prepare train/test transaction datasets
- Train a fraud-risk model using a scikit-learn pipeline
- Evaluate ROC-AUC, PR-AUC, Brier score, and classification metrics
- Compare against simple baselines
- Search thresholds using cost-sensitive metrics
- Generate threshold policy artifacts for analyst review
- Score new transaction CSV files
- Validate required input columns before training or scoring
- Add fraud probabilities and binary fraud flags
- Add analyst-friendly reason codes to scored transactions
- Sort scored outputs by fraud probability
- Provide a Streamlit dashboard for risk review
- Display SHAP explanations for selected transactions
- Run automated tests and CI smoke workflows
This project does not:
- Detect real fraud in production
- Use real banking or payment-network data
- Guarantee fraud decisions are fair, compliant, or deployable
- Replace human fraud analysts
- Replace compliance, legal, or model-risk review
- Provide real-time streaming detection
- Include drift monitoring or retraining automation
- Prove performance on real-world fraud distributions
A production fraud system would need stronger governance, live monitoring, adversarial testing, compliance controls, access control, audit logging, and human escalation workflows.
- Synthetic fraud data generator with overlap, class imbalance, and label noise
- Reusable scikit-learn pipeline with preprocessing and model training
- Data validation for training and scoring inputs
- Cost-sensitive threshold search
- PR-AUC and Brier score for imbalanced probability evaluation
- Baseline comparisons for majority, prior, and stratified-random strategies
- Threshold policy artifacts for cost, recall, precision, and review-capacity tradeoffs
- Batch scoring CLI for new transaction files
- Reason-code generation for analyst-friendly review
- SHAP explainability for selected transactions
- Streamlit dashboard for interactive review
- Unit tests and GitHub Actions CI
- Generated reports and figures for model evaluation
Synthetic or raw transactions
↓
Data preparation and validation
↓
Train/test split
↓
Preprocessing + fraud model pipeline
↓
Evaluation metrics and baselines
↓
Threshold search and policy artifacts
↓
Batch scoring
↓
Dashboard triage + SHAP + reason codes
Financial-Fraud-Risk-Engine/
│
├── .github/
│ └── workflows/
│ └── ci.yml
│
├── data/
│ ├── raw/
│ │ └── synthetic_fraud_dataset.csv
│ └── processed/
│ ├── transactions_train.csv
│ └── transactions_test.csv
│
├── models/
│ ├── fraud_pipeline.joblib
│ └── threshold.json
│
├── reports/
│ ├── figures/
│ │ ├── confusion_matrix.png
│ │ ├── roc_curve.png
│ │ ├── pr_curve.png
│ │ ├── calibration_curve.png
│ │ ├── threshold_cost_curve.png
│ │ └── threshold_tradeoffs.png
│ └── metrics/
│ ├── metrics.json
│ ├── evaluation_summary.json
│ ├── threshold_search.json
│ ├── threshold_search.csv
│ ├── threshold_policy.json
│ ├── threshold_policy.csv
│ └── threshold_policy.md
│
├── src/
│ ├── config.py
│ ├── data_prep.py
│ ├── dashboard_utils.py
│ ├── evaluate.py
│ ├── explain.py
│ ├── features.py
│ ├── generate_synthetic_data.py
│ ├── reason_codes.py
│ ├── score_new_transactions.py
│ ├── threshold_policy.py
│ ├── train_model.py
│ └── validation.py
│
├── tests/
│ ├── test_dashboard_helpers.py
│ ├── test_evaluation_improvements.py
│ ├── test_project_integrity.py
│ ├── test_reason_codes.py
│ ├── test_synthetic_data_generation.py
│ ├── test_threshold_policy.py
│ ├── test_validation_and_scoring.py
│ └── test_workflow_contracts.py
│
├── app.py
├── requirements.txt
├── README.md
└── LICENSE
git clone https://github.com/AmirhosseinHonardoust/Financial-Fraud-Risk-Engine.git
cd Financial-Fraud-Risk-EngineOn Windows CMD:
python -m venv .venv
.venv\Scripts\activateOn macOS/Linux:
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtRun the full local workflow:
python -m src.generate_synthetic_data --rows 3500 --fraud-rate 0.08 --label-noise 0.04 --seed 42 --output data/raw/synthetic_fraud_dataset.csv
python -m src.data_prep
python -m src.train_model
python -m src.evaluate
python -m src.score_new_transactions data/processed/transactions_test.csv --output_csv reports/metrics/test_scored.csvLaunch the dashboard:
streamlit run app.pyThe project includes a synthetic fraud-data generator:
python -m src.generate_synthetic_data \
--rows 3500 \
--fraud-rate 0.08 \
--label-noise 0.04 \
--seed 42 \
--output data/raw/synthetic_fraud_dataset.csvThe generator creates a harder dataset than a deterministic toy example by adding:
- overlapping legitimate and fraud patterns
- controlled label noise
- false-positive-looking legitimate transactions
- lower-risk-looking fraud transactions
- class imbalance
- stochastic fraud labels
This makes threshold selection and precision/recall tradeoffs more meaningful.
Prepare train/test data:
python -m src.data_prepTrain the fraud model:
python -m src.train_modelEvaluate the model:
python -m src.evaluateEvaluation outputs include:
reports/metrics/metrics.json
reports/metrics/evaluation_summary.json
reports/metrics/threshold_search.json
reports/metrics/threshold_search.csv
reports/figures/roc_curve.png
reports/figures/pr_curve.png
reports/figures/calibration_curve.png
reports/figures/threshold_cost_curve.png
reports/figures/threshold_tradeoffs.png
reports/figures/confusion_matrix.png
Fraud thresholds are business decisions. A low threshold catches more fraud but creates more false positives. A high threshold reduces review volume but can miss fraud.
This project generates policy artifacts to make those tradeoffs easier to inspect:
reports/metrics/threshold_policy.json
reports/metrics/threshold_policy.csv
reports/metrics/threshold_policy.md
Policy candidates include:
| Policy | Purpose |
|---|---|
cost_optimized |
Minimizes the configured false-positive / false-negative cost |
balanced_f1 |
Balances precision and recall using F1 score |
high_recall |
Prioritizes catching fraud cases |
high_precision |
Prioritizes reducing false positives |
review_capacity |
Keeps the flagged transaction rate within review capacity |
These policies are decision-support artifacts, not automatic approval or rejection rules.
Score a transaction file:
python -m src.score_new_transactions data/processed/transactions_test.csv --output_csv reports/metrics/scored_transactions.csvThe scored output includes:
| Column | Description |
|---|---|
fraud_probability |
Model-estimated fraud probability |
fraud_flag |
Binary flag based on the saved or provided threshold |
reason_codes |
Human-readable risk drivers for analyst review |
The output is sorted by descending fraud probability so the riskiest transactions appear first.
Launch the app:
streamlit run app.pyThe dashboard supports:
- uploading transaction CSV files
- validating uploaded columns
- adjusting fraud threshold
- viewing risk distribution
- reviewing top-risk transactions
- downloading scored CSV files
- inspecting selected transactions with SHAP
- viewing analyst-friendly reason codes
- checking model metadata and saved threshold
The dashboard includes a warning that the data and model are synthetic-demo artifacts.
The project includes two explanation layers.
SHAP is used to inspect how transformed model features contribute to an individual prediction.
Reason codes convert risk signals into short, reviewable explanations, such as:
High device risk score
High IP risk score
Transaction amount is high for this batch
Transaction occurred during unusual hours
Merchant category is higher risk in the demo data
Critical model risk score
Reason codes are not causal explanations. They are analyst-facing summaries to make triage easier.
The evaluation layer includes metrics designed for imbalanced fraud-risk workflows.
| Metric | Why it matters |
|---|---|
| ROC-AUC | Measures ranking quality across thresholds |
| Average precision / PR-AUC | More informative for imbalanced fraud datasets |
| Brier score | Measures probability quality and calibration |
| Precision | Measures how many flagged cases are actually fraud |
| Recall | Measures how many fraud cases are caught |
| False-positive rate | Measures legitimate-user friction |
| Flagged rate | Estimates review workload |
| Cost | Encodes false-positive and false-negative tradeoffs |
Example results from the included harder synthetic-data workflow:
| Metric | Example value |
|---|---|
| ROC-AUC | 0.976 |
| Average precision / PR-AUC | 0.843 |
| Brier score | 0.052 |
| Selected threshold | 0.30 |
| Precision at selected threshold | 0.514 |
| Recall at selected threshold | 0.961 |
| Flagged rate | 0.206 |
These values are from a synthetic demo dataset and should not be interpreted as real-world fraud detection performance.
Additional threshold tradeoff chart
Run unit tests locally:
python -m unittest discover -s tests -vCompile source files:
python -m compileall src app.py testsThe GitHub Actions workflow checks:
- dependency installation
- source compilation
- unit tests
- synthetic-data generation
- data preparation
- model training
- evaluation
- scoring
- scored CSV schema validation
- expected model/report artifacts
CI is defined in:
.github/workflows/ci.yml
The project separates major responsibilities across modules:
| Module | Purpose |
|---|---|
src/generate_synthetic_data.py |
Creates harder synthetic fraud data |
src/data_prep.py |
Prepares train/test datasets |
src/features.py |
Builds preprocessing and model pipeline |
src/train_model.py |
Trains and saves the fraud model |
src/evaluate.py |
Evaluates metrics, thresholds, and plots |
src/threshold_policy.py |
Generates threshold policy artifacts |
src/score_new_transactions.py |
Scores new transaction CSV files |
src/validation.py |
Validates training and scoring input schemas |
src/reason_codes.py |
Generates analyst-friendly risk explanations |
src/dashboard_utils.py |
Provides dashboard helper logic |
src/explain.py |
Provides SHAP explanation utilities |
This project has important limitations:
- The data is synthetic, not real financial data
- Results do not prove real-world fraud detection performance
- Reason codes are heuristic and not causal explanations
- The dashboard is a demo, not a secure fraud operations platform
- No live monitoring or drift detection is included
- No real-time streaming inference is included
- No compliance or fairness review is included
- No adversarial fraud adaptation loop is included
- Threshold policies are examples, not business-approved rules
The project is strongest as a portfolio demonstration of fraud-risk workflow design.
This repository is intended for:
- learning fraud-risk modeling workflows
- demonstrating cost-sensitive evaluation
- practicing model validation and scoring
- showing explainability and dashboard design
- portfolio demonstration
It should not be used as-is for:
- real fraud decisions
- payment blocking
- account closure
- law-enforcement reporting
- credit or lending decisions
- high-stakes automated decisions
Any real deployment would require expert review, monitoring, governance, security controls, and compliance validation.
Potential next improvements:
- Add drift simulation and drift monitoring
- Add calibration model comparison
- Add time-based train/test split
- Add reviewer feedback loop
- Add FastAPI scoring endpoint
- Add Docker support
- Add model card and data statement
- Add fairness and subgroup analysis
- Add model registry-style metadata
- Add alerting and monitoring examples
- Add richer transaction sequence features
- Python
- pandas
- NumPy
- scikit-learn
- SHAP
- Streamlit
- matplotlib
- joblib
- unittest
- GitHub Actions
Amir Honardoust
GitHub: @AmirhosseinHonardoust
This project is intended for educational and portfolio purposes.
If you use or modify this project, please keep the responsible-use notes and limitations clear.




