You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tied splits on different features in histogram gradient boosted trees with redundant features.
decision tree splits on the same feature with equivalent threshods: X = [[0], [1], [2], [3]] and y = [0, 1, 1, 0]: X > 0.5 and X > 2.5 are tied splits but only X > 0.5 is considered;
possibly other models (please feel free to edit this list or suggest missing estimators in the comments).
If the tie breaking logic is deterministic, then it might introduce a non-controllable bias in a datascience pipeline. For instance when analyzing the feature importance of an histogram gradient boosting model (via permutations or SHAP values), the first feature of a group of redundant features would always deterministically be picked up by the model and could lead a naive datascientist to believe that the other features of the group are not as predictive.
Note that this is not the case for our traditional DecisionTreeClassifier/Regressor / RandomForestClassifier/Regressor and extra trees because they all do feature shuffling (controllable by random_state) by default even when max_features == 1.0. This makes it easy to conduct the same study many times with different seeds to see if the results are an artifact of an arbitrary tie breaking or not.
Some estimators have arbitrary ways to break ties:
X = [[0], [1], [2], [3]]andy = [0, 1, 1, 0]:X > 0.5andX > 2.5are tied splits but onlyX > 0.5is considered;If the tie breaking logic is deterministic, then it might introduce a non-controllable bias in a datascience pipeline. For instance when analyzing the feature importance of an histogram gradient boosting model (via permutations or SHAP values), the first feature of a group of redundant features would always deterministically be picked up by the model and could lead a naive datascientist to believe that the other features of the group are not as predictive.
Note that this is not the case for our traditional
DecisionTreeClassifier/Regressor/RandomForestClassifier/Regressorand extra trees because they all do feature shuffling (controllable byrandom_state) by default even whenmax_features == 1.0. This makes it easy to conduct the same study many times with different seeds to see if the results are an artifact of an arbitrary tie breaking or not.