-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
partial_dependence ignores sample weights #13192
Copy link
Copy link
Closed
Description
Description
When training a GBT with sample weights, the partial dependence plot completely ignores the sample weights.
Steps/Code to Reproduce
Create a dataset with two subpopulations, one subpopulation where y = X[:,1] and the other where y = -X[:,1] so that without sample weights, the partial dependences cancel out.
Add a large sample weight to the first subpopulation compared to the other (e.g. a 100 to 1 ratio) so that the resulting model should reflect the dependence y = X[:,1].
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor, partial_dependence
N = 1000000
X = np.vstack((np.random.randint(2, size=N), np.random.rand(N))).T
mask_0 = np.where(X[:,0]==0)
mask_1 = np.where(X[:,0]==1)
y = np.zeros(N)
y[mask_0] = X[:,1][mask_0]
y[mask_1] = -X[:,1][mask_1]
sample_weight = np.zeros(N)
sample_weight[mask_0] = 100.
sample_weight[mask_1] = 1.
gbt = GradientBoostingRegressor()
gbt.fit(X, y, sample_weight=sample_weight)
grid = np.arange(0,1,0.01)
pdp = partial_dependence.partial_dependence(gbt, [1], grid=grid)Expected Results
Partial dependence with sample weights should mostly reflect the points of the dataset where y = X[:,1].
Actual Results
Versions
0.21.dev0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels

