Skip to content

[MRG+1] Fix/pipeline param error msg#13536

Merged
NicolasHug merged 24 commits intoscikit-learn:masterfrom
okz12:fix/pipeline_param_error_msg
Apr 9, 2019
Merged

[MRG+1] Fix/pipeline param error msg#13536
NicolasHug merged 24 commits intoscikit-learn:masterfrom
okz12:fix/pipeline_param_error_msg

Conversation

@okz12
Copy link
Copy Markdown
Contributor

@okz12 okz12 commented Mar 28, 2019

Reference Issues/PRs

Fixes #13534

What does this implement/fix? Explain your changes.

The error message for passing a sample_weight is unclear. A check has been introduced for when the step name is not found in the fit_params_steps that raises a TypeError.

Any other comments?

  • Is the wording of the error clear enough?
  • Will require adding a test case once approach is finalised

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to test_pipeline.py. I'm pretty sure this does not fix the issue

@okz12
Copy link
Copy Markdown
Contributor Author

okz12 commented Mar 29, 2019

I added this test case that catches the error:

def test_pipeline_param_error():
    clf = Pipeline(memory=None, steps=[('wrong_step_name', LogisticRegression())])
    assert_raise_message(TypeError,
                         "Use naming convention step__parameter.",
                         clf.fit, [[0], [0]], [0, 1],
                         logisticregression__sample_weight=[1, 1]
                         )

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Mar 29, 2019 via email

@NicolasHug
Copy link
Copy Markdown
Member

@okz12 as described in the original issue #13534 (comment) the problem we are trying to address here is that if a user passes sample_weight to a pipeline as-is, e.g.

pipe.fit([[0], [0]], [0, 1], sample_weight=[1, 1])

they end up with an error message that is not clear at all (Not enough values to unpack...). We want this error message to become something along the lines of

Passing sample_weight directly to a pipeline isn't supported. You can however pass 
sample_weight to specific steps of your pipeline, e.g. 
`pipe.fit(X, y, logisticregression__sample_weight=sample_weight)`.

@okz12
Copy link
Copy Markdown
Contributor Author

okz12 commented Mar 30, 2019

I've added the error message and an accompanying test. I am checking on whether two underscores '__' are within the variable name, and this would catch other parameters besides sample_weight. I made the error generic referring to parameters rather than sample_weight in particular but kept the sample_weight parameter passing example. Let me know if this should be changed.

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but LGTM otherwise

@NicolasHug
Copy link
Copy Markdown
Member

Also you might want to change the title to [MRG]

@okz12 okz12 changed the title [WIP] Fix/pipeline param error msg [MRG] Fix/pipeline param error msg Mar 31, 2019
@jnothman
Copy link
Copy Markdown
Member

Not sure if we need to worry about changing a ValueError into a TypeError.

@jnothman
Copy link
Copy Markdown
Member

Or actually, whether we must be consistent and also raise a TypeError in the case of any other parameter names (including __)

@okz12 okz12 changed the title [MRG] Fix/pipeline param error msg [MRG+1] Fix/pipeline param error msg Apr 1, 2019
@okz12
Copy link
Copy Markdown
Contributor Author

okz12 commented Apr 4, 2019

I found examples in gaussian_process/kernels.py and base.py where it splits on '__' and raises a ValueError if the parameter name is invalid. Not sure if I should change pipeline.py to ValueError or the other two to TypeError.

@NicolasHug
Copy link
Copy Markdown
Member

Hm, I guess the most consistent way would be to raise a ValueError here instead of a TypeError?

We raise ValueError when invalid parameters are passed to the estimators, and that's what this check is about

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Apr 4, 2019 via email

@okz12
Copy link
Copy Markdown
Contributor Author

okz12 commented Apr 6, 2019

Changing to ValueError breaks the test for fix #13472. The error message picked up is from my fix rather than the fix in #13472, which causes the error message to mismatch and the test-case to fail. I'm not entirely sure of the implications of this. Is it good that this fix captures both errors, in which case I can update the test case for gradient boosting.

Test Case for #13472:

def test_gradient_boosting_with_init_pipeline():
    # Check that the init estimator can be a pipeline (see issue #13466)

    X, y = make_regression(random_state=0)
    init = make_pipeline(LinearRegression())
    gb = GradientBoostingRegressor(init=init)
    gb.fit(X, y)  # pipeline without sample_weight works fine

    with pytest.raises(
            ValueError,
            match='The initial estimator Pipeline does not support sample '
                  'weights'):
        gb.fit(X, y, sample_weight=np.ones(X.shape[0]))

Output Error:

E           AssertionError: Pattern 'The initial estimator Pipeline does not support sample weights' not found in 'Pipeline.fit does not accept the sample_weight parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. `Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight)`.'

sklearn/ensemble/tests/test_gradient_boosting.py:1396: AssertionError

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Apr 6, 2019

Yes, IMO, change the gradient boosting check to conform to this.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Apr 6, 2019

You need to not modify the gradient boosting test, but the gradient boosting code, to check for the right pipeline failure error message here:

if 'not enough values to unpack' in str(e): # pipeline

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks @okz12

@NicolasHug NicolasHug merged commit 0e54f44 into scikit-learn:master Apr 9, 2019
@NicolasHug
Copy link
Copy Markdown
Member

Merged, thanks @okz12 !

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve error message when passing sample_weight to Pipeline

3 participants