Skip to content

Conversation

@damien2012eng
Copy link
Member

I verified the ASAP dataset using RSMTool toolset. The column mechanics has outlier values. Here is an example result (Response 181) for truncating and w/o truncating:
Truncating outlier: 3.213 (Original data) and 3.248 (preprocessed data)
W/O Truncating outlier: 3.213 (for both original data and preprocessed data)
I also verified the outliers range for the Mechanics: [3.248, 5.648]. Because the value 3.213 is lower than the lower bound, it is truncated to be the value of the lower bound.

Closes #660

@codecov
Copy link

codecov bot commented Oct 4, 2023

Codecov Report

All modified lines are covered by tests ✅

Files Coverage Δ
rsmtool/preprocessor.py 96.44% <100.00%> (+0.02%) ⬆️
rsmtool/rsmexplain.py 92.19% <100.00%> (+0.03%) ⬆️
rsmtool/utils/constants.py 100.00% <ø> (ø)

📢 Thoughts on this report? Let us know!.

@damien2012eng damien2012eng force-pushed the 660/truncate_feature_values branch from 3299a63 to 7217c05 Compare October 4, 2023 17:19
@damien2012eng damien2012eng requested a review from tamarl08 October 4, 2023 17:59
@tamarl08
Copy link
Contributor

tamarl08 commented Oct 4, 2023

Looks good @damien2012eng ! just added a couple of docstring nitpicks.

Copy link
Collaborator

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damien2012eng looks pretty good but it's still missing a critical piece. Do you remember how we check that the value of standardize_features specified for rsmexplain and in the original rsmtool experiment match and, if not, we raise a warning? We need to do this same for truncate_outliers as well.

Instead of duplicating code, may be factor out that code as a function in rsmexplain.py and use it for both fields?

@damien2012eng damien2012eng force-pushed the 660/truncate_feature_values branch from b301453 to 091fbfa Compare October 5, 2023 18:53
@damien2012eng damien2012eng force-pushed the 660/truncate_feature_values branch from 091fbfa to 4d31f66 Compare October 5, 2023 19:40
@damien2012eng damien2012eng requested a review from tamarl08 October 5, 2023 19:40
Use a simple `for` loop to verify and overwrite the values of
`standardize_features` and `truncate_outliers` for rsmexplain.
Add new tests to check that `truncate_outliers` values are correctly
overwritten when necessary.
@desilinguist
Copy link
Collaborator

@damien2012eng don't forget to fix the file @tamarl08 pointed out. Otherwise this is ready to merge.

@damien2012eng damien2012eng merged commit e27474d into main Oct 6, 2023
@delete-merged-branch delete-merged-branch bot deleted the 660/truncate_feature_values branch October 6, 2023 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature values get truncated when using rsmExplain on SKLL models

4 participants