Skip to content

Commit e5cc2b0

Browse files
authored
ENH Uses binned values from training to find missing values (#16883)
* ENH Uses training data to find missing values * CLN Uses the binned data to find missing value
1 parent 41488fc commit e5cc2b0

1 file changed

Lines changed: 5 additions & 2 deletions

File tree

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -192,8 +192,6 @@ def fit(self, X, y, sample_weight=None):
192192
X_train, y_train, sample_weight_train = X, y, sample_weight
193193
X_val = y_val = sample_weight_val = None
194194

195-
has_missing_values = np.isnan(X_train).any(axis=0).astype(np.uint8)
196-
197195
# Bin the data
198196
# For ease of use of the API, the user-facing GBDT classes accept the
199197
# parameter max_bins, which doesn't take into account the bin for
@@ -211,6 +209,11 @@ def fit(self, X, y, sample_weight=None):
211209
else:
212210
X_binned_val = None
213211

212+
# Uses binned data to check for missing values
213+
has_missing_values = (
214+
X_binned_train == self.bin_mapper_.missing_values_bin_idx_).any(
215+
axis=0).astype(np.uint8)
216+
214217
if self.verbose:
215218
print("Fitting gradient boosted rounds:")
216219

0 commit comments

Comments
 (0)