You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In various parts of the code, we have tests for sample_weight support, including in metrics, and for individual estimators. we have some common estimator checks for class_weight, but not really for sample_weight functionality (only for weight type invariance).
Invariance testing for sample weights should include:
sample_weight=np.ones(len(X)) makes the same model as sample_weight=None
sample_weight=random can make a different model to sample_weight=None
sample_weight=s for integer array s makes the same model as X=np.repeat(X, s, axis=0), y=np.repeat(y, s, axis=0) (although there may be exceptions to this depending on how the estimator defines iteration, convergence, etc., as in Test test_weighted_vs_repeated is somehow flaky #11236)
sample_weight=s * k for array s and positive constant k makes the same model as sample_weight=s
I wonder if it is possible to establish a generic test for this, e.g. something like:
defcheck_sample_weight_invariance(data_args, fit, is_equal):
""" Parameters ---------- data_args : dict Keyword arguments to pass to fit, and which would need to be repeated to test equivalence to integer sample weights. fit : callable Passed data args, returns a model that can be compared with is_equal is_equal : callable Passed two models returned from fit, returns a bool to indicate equality between models """
In various parts of the code, we have tests for
sample_weightsupport, including in metrics, and for individual estimators. we have some common estimator checks forclass_weight, but not really forsample_weightfunctionality (only for weight type invariance).Recent implementations of
sample_weightinclude #10933 (KMeans) and #10803 (density estimation). But as well as estimators we have things like common tests for evaluation metrics.Invariance testing for sample weights should include:
sample_weight=np.ones(len(X))makes the same model assample_weight=Nonesample_weight=randomcan make a different model tosample_weight=Nonesample_weight=sfor integer arraysmakes the same model asX=np.repeat(X, s, axis=0), y=np.repeat(y, s, axis=0)(although there may be exceptions to this depending on how the estimator defines iteration, convergence, etc., as in Test test_weighted_vs_repeated is somehow flaky #11236)sample_weight=s * kfor arraysand positive constantkmakes the same model assample_weight=sI wonder if it is possible to establish a generic test for this, e.g. something like: