Skip to content

Enable categorical variables as predictors in IV analysis#247

Merged
martinctc merged 12 commits into
mainfrom
copilot/fix-230
Aug 26, 2025
Merged

Enable categorical variables as predictors in IV analysis#247
martinctc merged 12 commits into
mainfrom
copilot/fix-230

Conversation

Copilot AI commented Aug 4, 2025

Copy link
Copy Markdown
Contributor

The create_IV function previously failed when categorical variables (e.g., Gender, Department, Region) were supplied as predictors because it only supported numeric variables. This limitation prevented analysts from including valuable categorical factors in their Information Value analysis.

This PR also updates the package to v1.10.0, and is submitted into CRAN.

Changes Made

This PR extends the IV analysis pipeline to fully support categorical variables:

Core Implementation

  • Modified calculate_IV function to detect and handle categorical variables using is.character() || is.factor()
  • Updated binning strategy for categorical variables to use the categories themselves as intervals instead of quantile-based binning
  • Enhanced labeling to display actual category names rather than numeric ranges
  • Expanded map_IV function to include categorical variables in automatic predictor selection

Before and After

Before:

# This would fail
create_IV(data, outcome = "high_engagement", predictors = c("Gender", "Department"))

After:

# Now works seamlessly
create_IV(data, outcome = "high_engagement", predictors = c("Gender", "Department"))
create_IV(data, outcome = "promotion_rate", predictors = c("Level", "Region", "Tenure_Years"))
create_IV(data, outcome = "retention", predictors = NULL)  # Auto-includes categoricals

Example Output

For a categorical variable like FunctionType, the function now produces meaningful results:

  FunctionType  N Percent       WOE        IV
1  Engineering 63    0.63  0.673255 0.2612197
2      G_and_A 21    0.21 -1.239064 0.5755636
3   Operations 16    0.16 -0.833599 0.6878052

Business Impact

This enhancement enables analysts to:

  • Include department, role, location, and other categorical factors in IV analysis
  • Discover patterns in categorical variables that were previously inaccessible
  • Build more comprehensive predictive models combining both numeric and categorical predictors
  • Identify high-value categorical predictors for business insights

Testing

Added comprehensive test cases covering:

  • Categorical predictors only
  • Mixed categorical and numeric predictors
  • Automatic inclusion of categorical variables when predictors=NULL
  • Verification that existing numeric functionality remains unchanged

Fixes #230.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cran.r-project.org
    • Triggering command: /usr/lib/R/bin/exec/R --vanilla -e ~n~install.packages(c('dplyr',~+~'tidyr',~+~'tidyselect',~+~'purrr',~+~'ggplot2',~+~'glue'),~+~repos='REDACTED',~+~dependencies=TRUE)~n~ (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: martinctc <17925865+martinctc@users.noreply.github.com>
Copilot AI changed the title [WIP] Feature: Enable categorical variables as predictors in IV analysis Enable categorical variables as predictors in IV analysis Aug 4, 2025
Copilot AI requested a review from martinctc August 4, 2025 11:03
@martinctc martinctc marked this pull request as ready for review August 7, 2025 14:55
Extended `create_IV()` and `map_IV()` to include character and factor variables as predictors, not just numeric.
Documented support for categorical predictors in create_IV(), enhanced display control in create_dt(), and improved test coverage. Also noted detection of text missing values in validation_report().
@martinctc

Copy link
Copy Markdown
Member

wpa v1.10.0 has been accepted into CRAN, merging this PR now.

@martinctc martinctc merged commit 0a90757 into main Aug 26, 2025
3 checks passed
@martinctc martinctc deleted the copilot/fix-230 branch August 26, 2025 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Enable categorical variables as predictors in IV analysis

2 participants