Skip to content

Added sample weight handling to BinMapper under HGBT#29641

Merged
adrinjalali merged 96 commits intoscikit-learn:mainfrom
snath-xoc:HGBT_bin_weights
Mar 3, 2026
Merged

Added sample weight handling to BinMapper under HGBT#29641
adrinjalali merged 96 commits intoscikit-learn:mainfrom
snath-xoc:HGBT_bin_weights

Conversation

@snath-xoc
Copy link
Copy Markdown
Contributor

@snath-xoc snath-xoc commented Aug 8, 2024

Fixes #29640, #27117
Towards #16298

Calls sample weight within BinMapper and passes it on to _find_binning_thresholds where bin midpoints are calculated using weighted percentile. Sample weights also passed to rng.choice() subsampling in BinMapper for samples larger than 2e5

NOTE: when the n_bins<discrete_values the best workaround was to set the bins as the midpoints. In future, it may be worth getting rid of this altogether, however at the risk of getting inhomogeneous array from weighted_percentile. We will need to agree on the best methods of trimming.

Major changes proposed:

  • Subsampling with weights after which sample_weight is not propagated in _BinMapper (set to None)
  • Tests for sample_weight invariance for _BinMapper under deterministic case added
  • Some tests failed with the new changes and n_samples had to be increased e.g., test_interaction_cst_numerically
  • Some tests failed under special cases when sample weights are passed through but distinct values<max_bins. To be checked what to do
  • Statistical test added in comments below

NOTE: this also allows HGBT to pass weighted tests for more than 256 samples (but still less than 2e6)

TO DO:

  • Update tests under test_gradient_boosting so that sample weight is passed as positional argument
  • Fix test failures, something fishy changes such that e.g., test_interaction_cst_numerically is not working
  • Modified tests to have larger number of samples otherwise expected results are not attaines (i.e., r2 value threshold etc. is not reached).... was expecting this to be the other way so not sure if I set off another bug.
  • When distinct values are less than max_bins and sample weights are given we get a discrepancy between weighted and repeated under special cases (see test_zero_sample_weights_classification under test_gradient_boosting.py). Need to see what to do here.
  • Further test failure when max_bins are specified as > distinct values under weighted and repeated cases
  • Check that sample weight invariance holds under deterministic case (test added under test_binning)
  • For the case n_samples > subsample conduct a statistical analysis to check for that the repeated/reweighted equivalence holds for the binning procedure in expectation, similar to issue Incorrect sample weight handling in KBinsDiscretizer #29906. Conduct analysis for many different values random_state for match of the mean bin edges.

Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not analyse the test failures yet but here is some early feedback.

For the case n_samples > subsample we would need to conduct a statistical analysis to check for that the repeated/reweighted equivalence holds for the binning procedure in expectation.

Once the tests pass for the deterministic case, we should conduct such an analysis (e.g. using a side notebook, not included in the repo, where we rerun the binning for many different values random_state and then check for match of the mean bin edges).

Please add a TODO item to the description of this PR not to forget about this.

@snath-xoc
Copy link
Copy Markdown
Contributor Author

Statistical tests were conducted for n_samples>subsample (subsample=int(2e5)) using the following code:

import numpy as np
from sklearn.ensemble._hist_gradient_boosting.binning import _BinMapper
from scipy.stats import kstest
import matplotlib.pyplot as plt

BONFERRONI_CORRECTION = 1 #To adjust later according  to agreed test dim.

rng = np.random.RandomState(0)

n_samples = int(3e5)
X = rng.randint(0, 30, size=(n_samples,3))
# Use random integers (including zero) as weights.
sw = rng.randint(0, 5, size=n_samples)

X_repeated = np.repeat(X, sw,axis=0)
assert len(X_repeated)>int(2e5)

bin_thresholds_weighted=[]
bin_thresholds_repeated = []
for seed in np.arange(100):
    est_weighted = _BinMapper(n_bins=6, random_state=seed).fit(
                X, sample_weight=sw
            )

    est_repeated = _BinMapper(n_bins=6, random_state=seed+500).fit(
                X_repeated, sample_weight=None)

    bin_thresholds_weighted.append(est_weighted.bin_thresholds_)
    bin_thresholds_repeated.append(est_repeated.bin_thresholds_)

bin_thresholds_weighted = np.asarray(bin_thresholds_weighted)
bin_thresholds_repeated = np.asarray(bin_thresholds_repeated)

fig,axs = plt.subplots(3,4,figsize=(14,12))

j=0
for i,ax in enumerate(axs.flatten()):
    if i>0 and i%4==0:
        j+=1
    ax.hist(bin_thresholds_weighted[:,j,i%4].flatten())
    ax.hist(bin_thresholds_repeated[:,j,i%4].flatten(),alpha=0.5)
    pval = kstest(bin_thresholds_weighted[:,j,i%4].flatten(),
                                    bin_thresholds_repeated[:,j,i%4].flatten()).pvalue
    if pval<(0.05*BONFERRONI_CORRECTION):
        ax.set_title(f'p-value: {pval:.4f},failed')
    else:
        ax.set_title(f'p-value: {pval:.4f},passed')

The output is as follows:

image

@snath-xoc snath-xoc marked this pull request as ready for review March 21, 2025 13:32
@snath-xoc
Copy link
Copy Markdown
Contributor Author

The ARM lint test is still failing due to the test_find_binning_thresholds_small_regular_data assertion error but I can't reproduce it locally.

Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified tests to have larger number of samples otherwise expected results are not attaines (i.e., r2 value threshold etc. is not reached).... was expecting this to be the other way so not sure if I set off another bug.

This is annoying, but it's probably caused by the fact that we switched from np.percentile(..., method="linear") to np.percentile(..., method="averaged_inverted_cdf") in the sample_weight=None case. Assuming this change is really the cause and since it's necessary to fix the weight/repetition equivalence, we might consider this behavior change as a bug fix. It should be clearly documented as such in the changelog, possibly with a dedicated note in the "Changed models" section of the release notes to better warn the users about this change.

Another concern is that _averaged_weighted_percentile implementation is currently very naive and will cause a performance regression whenever the users passes sample_weight != None. We might want to postpone the final merge of this PR to wait for the review and merge of #30945 first.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@snath-xoc
Copy link
Copy Markdown
Contributor Author

@antoinebaker can you see if you can reproduce the test_subsampled_weighted_vs_repeated_equivalence failure on your side? I do not get it on my laptop :/

Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure is probably caused by the fact that the test is not fully deterministic:

@github-actions github-actions bot added the CI:Linter failure The linter CI is failing on this PR label Feb 4, 2026
@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Feb 4, 2026
@snath-xoc
Copy link
Copy Markdown
Contributor Author

@antoinebaker this is ready for another pass of review

Copy link
Copy Markdown
Contributor

@antoinebaker antoinebaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @snath-xoc, besides a few nitpicks, LGTM!

Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snath-xoc Since this change only impacts estimators from the sklearn.ensemble submodule, hence the changelog entry should be moved to the doc/whats_new/upcoming_changes/sklearn.ensemble folder. The changed-model folder should be reserved to library wide changes that impact estimators scattered in many modules.

Besides this and the following comment, LGTM.

@auguste-probabl auguste-probabl moved this from In progress - High Priority to In progress in Labs Feb 9, 2026
Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more refinements to better address the feedback in @lorentzenchr's review.

# statistically equivalent to passing unit weights.
subset = rng.choice(
X.shape[0], self.subsample, p=subsampling_probabilities, replace=True
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that the number of iterations in test_subsampled_weighted_vs_repeated_equivalence is large enough to detect the statistical discrepancy if we either set replace=False (easy) or with replace=sample_weight is not None (more subtle statistical bug).

Copy link
Copy Markdown
Contributor Author

@snath-xoc snath-xoc Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I use replace=sample_weight is not None on my side the test_subsampled_weighted_vs_repeated_equivalence fails as well, so seems like it is good enough at detecting the bug (from what I understand we would expect replace=sample_weight is not None to fail, unless I misunderstood)

@snath-xoc
Copy link
Copy Markdown
Contributor Author

@ogrisel this is ready it seems, I suppose we are waiting for @lorentzenchr?

@adrinjalali
Copy link
Copy Markdown
Member

Since this fixes the case with sample weights, and make the sample_weight=None compatible with sample weights case, let's merge this, even though it's a slight change in behavior and consider it a bug fix. We can always revert if users come with major complaints.

@adrinjalali adrinjalali merged commit 5f0f25f into scikit-learn:main Mar 3, 2026
37 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in Labs Mar 3, 2026
tonybaloney pushed a commit to tonybaloney/swe-complex-scikit-learn that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BinMapper within HGBT does not handle sample weights

6 participants