Real-time fraud detection platform using machine learning. Built with FastAPI, scikit-learn, XGBoost, and SHAP for model interpretability.
FraudSense AI is an enterprise-grade fraud detection system designed to identify fraudulent credit card transactions in real-time. The platform uses multiple machine learning models with advanced explainability features.
FraudSense AI/
├── backend/
│ ├── main.py # FastAPI application with all endpoints
│ ├── model.py # Model wrapper for predictions
│ ├── train.py # Model training script
│ ├── preprocessing.py # Data preprocessing utilities
│ ├── model_metadata.json # Model configuration and metrics
│ ├── global_feature_importance.json # SHAP feature importance
│ ├── RiskPlatform/ # Enterprise risk management modules
│ │ ├── decision_engine.py # Decision logic
│ │ ├── risk_scorer.py # Risk scoring
│ │ ├── explainability_engine.py # SHAP explanations
│ │ ├── model_health_monitor.py # Model health checks
│ │ ├── drift_monitor.py # Data drift detection
│ │ ├── audit_logger.py # Audit trail
│ │ ├── metrics_tracker.py # Real-time metrics
│ │ └── threshold_simulator.py # Threshold optimization
│ └── metrics/ # Generated visualizations
│ ├── confusion_matrix.png
│ ├── roc_curve.png
│ └── precision_recall_curve.png
├── frontend/
│ ├── dashboard.html # Main dashboard
│ ├── analysis.html # Analysis page
│ ├── history.html # Transaction history
│ ├── settings.html # Settings page
│ ├── script.js # Frontend JavaScript
│ └── styles.css # Dashboard styles
├── requirements.txt # Python dependencies
├── docker-compose.yml # Docker orchestration
└── .env.example # Environment configuration
The platform trains and compares three machine learning models:
- Type: Linear classifier with L2 regularization
- Purpose: Baseline model for interpretability
- Strengths: Highly interpretable, fast training, probability calibration
- Use Case: Quick baseline comparisons and regulatory explainability
- Type: Ensemble of decision trees
- Purpose: Primary production model
- Architecture:
- n_estimators: 100
- max_depth: 10
- min_samples_split: 5
- min_samples_leaf: 2
- class_weight: balanced
- Strengths: Robust to outliers, handles non-linear relationships, feature importance
- Use Case: Main fraud detection engine
- Type: Gradient boosting ensemble
- Purpose: High-performance alternative
- Architecture:
- n_estimators: 100
- max_depth: 6
- learning_rate: 0.1
- scale_pos_weight: balanced
- Strengths: Superior performance on structured data, built-in regularization
- Use Case: Performance optimization scenarios
| Model | ROC-AUC | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | ~0.92 | ~0.75 | ~0.78 | ~0.76 |
| Random Forest | 0.9542 | 0.8833 | 0.8214 | 0.85 |
| XGBoost | ~0.95 | ~0.87 | ~0.81 | ~0.84 |
- ROC-AUC: 95.42%
- Precision: 88.33%
- Recall: 82.14%
- F1-Score: 85%
Based on the test dataset (56,962 transactions), the Random Forest model achieves:
| Metric | Value | Description |
|---|---|---|
| True Positives (TP) | ~81 | Correctly identified fraudulent transactions |
| True Negatives (TN) | ~56,853 | Correctly identified legitimate transactions |
| False Positives (FP) | ~11 | Legitimate transactions flagged as fraud |
| False Negatives (FN) | ~17 | Fraudulent transactions missed |
Predicted
| Legit | Fraud |
Actual ----------+---------+---------+
| Legit | 56,853 | 11 |
| Fraud | 17 | 81 |
- Fraud Detection Rate (Recall): 82.14% - The model correctly identifies 82.14% of all fraudulent transactions
- Precision: 88.33% - Of all transactions flagged as fraud, 88.33% are actually fraudulent
- False Positive Rate: ~0.019% - Only 0.019% of legitimate transactions are incorrectly flagged
- True Negative Rate: 99.98% - Legitimate transactions are correctly approved 99.98% of the time
- Source: Credit Card Fraud Detection (Kaggle)
- Total Transactions: 284,807
- Training Samples: 227,845
- Test Samples: 56,962
- Features: 30 (V1-V28 + Amount + Time)
- Fraud Rate: ~0.17% (highly imbalanced)
- V14 - Primary fraud indicator
- V12 - Transaction pattern
- V4 - Customer behavior
- V3 - Transaction velocity
- V10 - Amount correlation
- V11 - Time-based pattern
- V17 - Transaction characteristics
- V16 - Risk factors
- V9 - Behavioral indicator
- V1 - Feature composite
- Python 3.11+
- Credit Card Fraud Detection dataset (from Kaggle)
- Docker (optional, for containerized deployment)
- Navigate to the project directory:
cd FraudSense AI- Create and activate virtual environment:
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac- Install dependencies:
pip install -r requirements.txt- Download the dataset:
- Download from: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- Place creditcard.csv in the project root
cd backend
python train.pyThis will:
- Load and preprocess the dataset
- Train all three models
- Evaluate using ROC-AUC, Precision, Recall, F1
- Select the best model based on ROC-AUC
- Generate visualizations
cd backend
python -m uvicorn main:app --host 0.0.0.0 --port 8001 --reloaddocker-compose up --buildThe model is already trained. Run:
cd backend
python -m uvicorn main:app --host 0.0.0.0 --port 8001| Method | Endpoint | Description |
|---|---|---|
| POST | /predict | Predict fraud for a transaction |
| GET | /simulate | Simulate random transaction |
| GET | /analytics | Get fraud analytics |
| GET | /metrics | Real-time metrics |
| GET | /model-health | Model health status |
| POST | /simulate-threshold | Threshold simulation |
| GET | /explain/{id} | Transaction explanation |
- Low: Fraud probability < 20%
- Medium: Fraud probability 20-60%
- High: Fraud probability > 60%
Access the web interface at:
- Main: http://localhost:8001/
- Dashboard: http://localhost:8001/dashboard
- Analysis: http://localhost:8001/analysis
- History: http://localhost:8001/history
API requires an API key header:
X-API-Key: dev-key-001
| Key | Role | Rate Limit |
|---|---|---|
| dev-key-001 | Admin | 1000/hr |
| analyst-key-001 | Analyst | 500/hr |
| auditor-key-001 | Auditor | 200/hr |
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key-001" \
-d '{
"V1": -1.36, "V2": -0.07, "V3": 2.54, "V4": 1.38,
"V5": -0.34, "V6": 0.48, "V7": 0.08, "V8": -0.74,
"V9": 0.10, "V10": -0.36, "V11": 1.23, "V12": -0.64,
"V13": 0.60, "V14": -0.54, "V15": 0.27, "V16": 0.62,
"V17": -0.26, "V18": 0.14, "V19": -0.18, "V20": 0.27,
"V21": -0.14, "V22": -0.03, "V23": -0.14, "V24": 0.14,
"V25": -0.26, "V26": 0.02, "V27": -0.14, "V28": -0.10,
"Time": 406.0, "Amount": 149.62
}'Create .env from .env.example:
API_PORT=8001
LOG_LEVEL=INFO
API_KEYS=dev-key-001:Admin:1000,analyst-key-001:Analyst:500,auditor-key-001:Auditor:200- Real-time Detection: Sub-10ms inference time
- Drift Detection: Monitors for data distribution changes
- Model Health Monitoring: Continuous performance tracking
- Audit Logging: Complete transaction trail
- Threshold Optimization: Configurable risk thresholds
- SHAP Explainability: Feature-level prediction explanations
- Backend: FastAPI, scikit-learn, XGBoost, SHAP, joblib
- Frontend: HTML5, CSS3, Vanilla JavaScript
- Data: Pandas, NumPy
- Deployment: Docker, uvicorn
This project is for educational purposes. The dataset is from Kaggle's Credit Card Fraud Detection competition.
The FraudSense AI platform has a roadmap for continuous improvement and expansion:
- Deep Learning Integration: Implement neural network models (LSTM, Autoencoders) for enhanced fraud pattern detection
- Ensemble Stacking: Combine multiple model predictions using meta-learners for improved accuracy
- Real-time Learning: Implement online learning to adapt to emerging fraud patterns dynamically
- Transfer Learning: Explore cross-domain knowledge transfer from other fraud detection domains
- Multi-channel Support: Extend detection to cover mobile payments, cryptocurrency transactions, and wire transfers
- User Behavior Analytics: Add behavioral biometrics and device fingerprinting
- Graph Neural Networks: Implement GNN-based detection for identifying fraud rings and coordinated activities
- Federated Learning: Enable privacy-preserving model training across multiple institutions
- Kubernetes Deployment: Full container orchestration with auto-scaling
- Real-time Streaming: Apache Kafka integration for high-throughput transaction processing
- MLOps Pipeline: Automated model retraining, validation, and deployment pipeline
- Model Registry: Version control and model lineage tracking
- Counterfactual Explanations: Generate actionable insights for false positive cases
- Regulatory Reporting: Automated compliance reports for GDPR, PCI-DSS
- Adversarial Robustness: Protection against adversarial attacks on the model
- Fairness Monitoring: Bias detection and mitigation across demographic groups
- Interactive Dashboards: Advanced BI integrations (Tableau, PowerBI)
- Anomaly Visualization: Visual exploration of detected anomalies
- Trend Analysis: Seasonal and long-term fraud trend forecasting
- Campaign Analysis: ROI analysis for fraud prevention strategies
- Synthetic Data Generation: Privacy-preserving synthetic data for model training
- Quantum-ready Algorithms: Prepare for quantum computing advances
- AutoML Integration: Automated hyperparameter tuning and feature engineering
- Benchmarking Suite: Standardized evaluation metrics and comparison frameworks