Skip to content

Implements sample_weight for sklearn.neighbors.KNeighborsClassifier#33588

Open
JacobHass8 wants to merge 18 commits intoscikit-learn:mainfrom
JacobHass8:main
Open

Implements sample_weight for sklearn.neighbors.KNeighborsClassifier#33588
JacobHass8 wants to merge 18 commits intoscikit-learn:mainfrom
JacobHass8:main

Conversation

@JacobHass8
Copy link
Copy Markdown

@JacobHass8 JacobHass8 commented Mar 19, 2026

Reference Issues/PRs

Fixes #33457.

What does this implement/fix? Explain your changes.

Implements sample_weights argument for the fit method of the sklearn.neighbors.KNeighborsClassifier estimator. If the weights parameter is passed to KNeighborsClassifier, then we simply multiple the distance and sample_weights together.

Comments

I haven't added many tests for this argument. The only tests check the following:

  • Ensure a constant sample_weights array doesn't effect predicted values
  • If sample_weights does not have the same size as Y in fit(X, y, sample_weights=..), then an error is thrown
  • If all sample_weights <= 0, then an error is thrown

I think there should be at least one test with non-uniform weights. Would translating tests from another Estimator work?

AI usage disclosure

N/A

@github-actions github-actions bot added module:neighbors CI:Linter failure The linter CI is failing on this PR labels Mar 19, 2026
@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Mar 20, 2026
@github-actions github-actions bot added the CI:Linter failure The linter CI is failing on this PR label Mar 20, 2026
@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Mar 20, 2026
@JacobHass8
Copy link
Copy Markdown
Author

JacobHass8 commented Mar 21, 2026

@lorentzenchr would you mind reviewing this PR? If this works, I can use it as a template for the other estimators in sklearn.neighbors. I just multiplied the distance weights by sample_weights to get an overall weight for each point.

One issue I had was with the function _check_sample_weight_equivalence. From the docstring, the function "check(s) that setting sample_weight to zero / integer is equivalent, to removing / repeating corresponding samples." For most estimators this check should pass, but I don't think that it should for the KNN estimator. Duplicating points doesn't just double the weight for a data point it changes how many times that point is calculated as a neighbor. Thus, it shouldn't be the same as doubling the weight of the point.

One way I could see of reconciling this is by removing any duplicate data points before running the predictions and increasing the sample_weight proportionally. Then _check_sample_weight_equivalence would pass. I'm not sure if that is changing the underlying KNN estimator though. It seems like more of a trade off.

I'm also unsure how sample_weight=0 should be handled. If all the samples are 0, an error is thrown as expected. However, there can still be issues when a couple of sample weights are 0. For example, if all k nearest neighbors have a weight of 0, should an error be thrown? Should nan be returned? I think maybe the best thing to do would be to remove these points before getting a point's neighbors.

@github-actions github-actions bot added the CI:Linter failure The linter CI is failing on this PR label Mar 24, 2026
@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Mar 24, 2026
@github-actions github-actions bot added the CI:Linter failure The linter CI is failing on this PR label Mar 25, 2026
@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

META track sample_weight support for all estimators

1 participant