You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Typically I would like to add missing_rate, target_correlation_rate, feature_correlation_rate and missing_values.
The target_correlation_rate would control the extent to which the dataset is MNAR* and feature_correlation_rate would control the extent to which the dataset is MAR†.
Would it be worthwhile to add parameters to control missingness in dataset generators?
I need this for benchmarking [MRG] ENH Add support for missing values to Tree based Classifiers #5974. Thought this might come in handy for teaching too.
Typically I would like to add
missing_rate,target_correlation_rate,feature_correlation_rateandmissing_values.The
target_correlation_ratewould control the extent to which the dataset is MNAR* andfeature_correlation_ratewould control the extent to which the dataset is MAR†.target_correlation_rate+feature_correlation_rate<= 11 - (target_correlation_rate + feature_correlation_rate)would control the extent to which the dataset is MCAR‡.Does this sound good?
Either as an addition or as an alternative to 1, could we have missing transformers with the above described params?
* - Missing Not At Random (Missingness is correlated with the target)
† - Missing At Random but correlated with the other feature values.
‡ - Missing Completely At Random (No correlation with either the target or parameters)
Ping @agramfort @glouppe @GaelVaroquaux