Skip to content

[MRG] ENH validate sample_weight with _check_sample_weight in IsotonicRegression#16203

Closed
marijavlajic wants to merge 2 commits intoscikit-learn:masterfrom
marijavlajic:check_weights
Closed

[MRG] ENH validate sample_weight with _check_sample_weight in IsotonicRegression#16203
marijavlajic wants to merge 2 commits intoscikit-learn:masterfrom
marijavlajic:check_weights

Conversation

@marijavlajic
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #15358 for IsotonicRegression.

What does this implement/fix? Explain your changes.

Replaces custom validation logic with standardized method utils.validation._check_sample_weight.

Any other comments?

Worked with @gelavizh1 @lschwetlick at scikit-learn sprint.

Worked also on the same issue in naive_bayes.py / for BaseDiscreteNB but it looks like the appropriate changes have already been made.

CC @adrinjalali @noatamir #WiMLDS

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @marijavlajic

@glemaitre glemaitre changed the title [MRG] Fix "Use _check_sample_weight to consistently validate sample_weight" in IsotonicRegression [MRG] ENH validate sample_weight with _check_sample_weight in IsotonicRegression Jan 26, 2020
sample_weight = np.ones(len(y), dtype=y.dtype)
else:
sample_weight = np.array(sample_weight[order], dtype=y.dtype)
sample_weight = _check_sample_weight(sample_weight, y, dtype=y.dtype)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

You need to apply the order parameter, maybe as,

    if sample_weight is None:
        sample_weight = np.ones(len(y), dtype=y.dtype)
    else:
        sample_weight = _check_sample_weight(sample_weight, y, dtype=y.dtype)
        sample_weight = sample_weight[order]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you think

sample_weight = _check_sample_weight(sample_weight, y, dtype=y.dtype)
sample_weight = sample_weight[order]

is enough ? It makes the code simpler and reordering ones has no effect and efficiency is not a problem there.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the code simpler and reordering ones has no effect and efficiency is not a problem there.

If you prefer. Slicing an array with int indexing does have some cost, but it should indeed be minimal.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same :)

@jeremiedbb
Copy link
Copy Markdown
Member

It has been done in #16322. Sorry @marijavlajic for the confusion but someone did the same thing and his PR got merged first. We still need to be better in coordination :/

@jeremiedbb jeremiedbb closed this Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use _check_sample_weight to consistently validate sample_weight

4 participants