Skip to content

Dhanush3620/Attrition-Prediction-Explainable-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💼 Employee Attrition Intelligence

A production-ready Machine Learning pipeline and interactive Streamlit Dashboard tailored for HR Attrition analytics, emphasizing Explainable AI (XAI) with LIME and SHAP.

✨ Features

  1. Interactive HR Dashboard: A beautiful, dark-mode native Python Streamlit application (app.py) for live risk analysis.
  2. Dual Explainability Engines:
    • Local Risk (LIME): Native Matplotlib rendering of single-employee turnover factors.
    • Macro Trends (SHAP): Aggregated TreeExplainer visualizations mapping company-wide impacts.
  3. Advanced ML Pipeline:
    • SimpleImputer automated missing value handling.
    • imbalanced-learn SMOTE for synthetic minority over-sampling (addressing the 84% retention class imbalance).
    • Embedded GridSearchCV routines to automatically harvest the best Random Forest & XGBoost hyperparameter permutations for Precision-Recall AUC optimization.

🚀 Quickstart

1. Requirements

Ensure you are running Python 3.10+ and install the dependencies:

pip install -r requirements.txt

2. Launch the Dashboard

The dashboard handles model training, grid-search tuning, and caching automatically on the first run.

streamlit run app.py

📁 Project Structure

├── data/
│   └── WA_Fn-UseC_-HR-Employee-Attrition.csv  # IBM HR Dataset
├── notebooks/
│   ├── 01_lime_attrition_API.ipynb            # API Playground
│   ├── 02_lime_attrition_example.ipynb        # EDA and Model Compare Notebook
│   └── 03_lime_attrition_example_v2.ipynb    
├── app.py                                     # Core Streamlit Dashboard UI
├── lime_attrition_utils.py                    # The comprehensive ML Wrapper API
├── requirements.txt
├── .gitignore
└── README.md

(Note: artifacts.joblib and trailing .mov files are ignored by Git).


🛠 Developer APIs

lime_attrition_utils.py abstracts away the boilerplate of scikit-learn models natively onto the LIME and SHAP engines.

Training the Pipeline

import lime_attrition_utils as utils

# Automatically loads, cleans, and splits data (handling target string matching)
config = utils.AttritionDataConfig()
raw_df, X, y, X_train, X_test, y_train, y_test, config = utils.load_and_prep_data()

# Defines a pipeline with SMOTE, Imputers, and GridSearchCV automatically tuning Random Forest
preprocessor = utils.build_preprocessor(X_train)
model_config = utils.ModelConfig(use_random_forest=True)
param_grids = {
    "random_forest": {"model__n_estimators": [100, 300], "model__max_depth": [3, 5]}
}

trained_models = utils.tune_and_train_models(X_train, y_train, preprocessor, model_config, param_grids)

Addressing LIME Edge Cases gracefully

  • Sparse Matrix Crashes: scikit-learn OneHotEncoder creates sparse arrays by default. Since LIME crashes on sparse matrices, the custom wrapper strictly enforces sparse_output=False throughout the pipeline.
  • Unseen Categoricals: Handled via handle_unknown="ignore" during cross-validation tuning.
  • Heavy Native LIME execution: If LIME builds its own HTML D3.js visualization, the payload approaches >1MB per call. Our UI overrides this using exp.as_pyplot_figure() mapped cleanly onto Streamlit.

About

Employee Attrition Prediction on IBM-HR Dataset from kaggle using LIME and SHAP tools

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors