Skip to content

MNT fix test for binary crossentropy in hist-GBDT#16691

Merged
NicolasHug merged 10 commits intoscikit-learn:masterfrom
lorentzenchr:tst_hgb
May 11, 2020
Merged

MNT fix test for binary crossentropy in hist-GBDT#16691
NicolasHug merged 10 commits intoscikit-learn:masterfrom
lorentzenchr:tst_hgb

Conversation

@lorentzenchr
Copy link
Copy Markdown
Member

@lorentzenchr lorentzenchr commented Mar 14, 2020

What does this implement/fix? Explain your changes.

This fixes a test for the binary cross entropy loss function in the module for HistGradientBoostingClassifier.

Any other comments?

See added comments in commits.

# Subtract a constant term such that the binary cross entropy
# has its minimum at zero. This only works if 0 < y_true < 1.
return loss.pointwise_loss(y_true, x) \
- loss.pointwise_loss(y_true, logit(y_true))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is a bit complex.

Can we rather write, `

if isinstance(loss, _LOSSES['binary_crossentropy']):
    assert_allclose(loss.pointwise_loss(y_true, optimum), 
                    loss.pointwise_loss(y_true, logit(y_true)))
else:
    ...

in the test directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Unfortunately, the newton root finder is allergic to such chances and complains with a no converged message as it can't ever reach a true zero. I would have to change from newton to a minimizer. I thought my change is the smaller one and the advantage of newton is that you also check for gradient and hessian in some way being correct.
What do you think?

@rth rth requested a review from NicolasHug March 18, 2020 15:33
Comment on lines +52 to +53
# The argmin of binary_crossentropy for y_true=±1 is ±inf due to logit,
# cf. "complete separation". Therefore we use 0 < y_true < 1.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean 0 / 1 in stead of -1 / 1?

Also, why did it work with y_true == 1 then?

Copy link
Copy Markdown
Member Author

@lorentzenchr lorentzenchr Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant y = 0 gives -inf, y = 1 gives +inf.
The reason, it worked for y_true = 1 so far was ... luck with numerical precision.

I looked at the solution of newton:

  • For y_true = 1 I find optimum = 35.47761173. Then pointwise_loss calls np.logaddexp(0, optimum) - optimum which gives 0 (true for all values of optimum > 33.271...).
  • Then, for y_true = 0, the loss becomes np.logaddexp(0, x) which equals zero for x < -745.1..., which the Newton/Halley root finder does not find.

@lorentzenchr
Copy link
Copy Markdown
Member Author

I just rebased. Merge conflicts resolved. CI lights are green.
@rth @NicolasHug I addressed your comments.

PS: As you might note, this is a very subtle, small step towards thinking of logistic regression as a regression on the domain [0, 1]:smirk:

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question otherwise LGTM, thanks!

assert np.allclose(loss.pointwise_loss(y_true, optimum), 0)
assert np.allclose(get_gradients(y_true, optimum), 0, atol=1e-7)
assert_allclose(loss.inverse_link_function(optimum)[0], y_true)
assert_allclose(func(optimum)[0], 0, atol=1e-14)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question on superficial read: doesn't func return a scalar (I'm a bit confused by the indexing) and if it's doesn't (and returns a ndarray as loss.pointwise_loss) why does using newton minimizer with it works as an objective function ?

Maybe some light type annotations on func could have helped. mypy now runs as part of CI..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 No, func returns an array with one element. I tried to return scalars everywhere which fails badly because the methods of loss classes often call reshape(-1) internally. (And I have not yet completely figured out why, but this is another story).

I added a comment. I can also add type annotations. Can you provide an short example with ndarrays (of one element)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an short example with ndarrays (of one element)?

The typing for ndarray dimensions is not too standardized yet (https://github.com/numpy/numpy-stubs), but I think even,

def func(x: np.ndarray) -> np.ndarray:

would have been useful as opposed to,

def func(x: np.ndarray) -> float:

but anyway it's just a side comment,

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lorentzenchr
Copy link
Copy Markdown
Member Author

@rth Thanks for the approval and the annotation hint. I was able to make it work mostly with scalars. I hope it is now easier to read (and your approval still holds, maybe the type annotations are now too much?)

@rth
Copy link
Copy Markdown
Member

rth commented May 10, 2020

Thanks! The current version looks good to me.

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lorentzenchr , a few comments regading readability but LGTM


def fprime(x):
return get_gradients(y_true, x)
def fprime(x: float) -> float:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add these as comments for now
I guess we need a more global decision on whether to use type annotations

Copy link
Copy Markdown
Member

@rth rth May 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need a more global decision on whether to use type annotations

I think the current situation is outlined in https://github.com/scikit-learn/scikit-learn/blame/fb76de72e7560aaa739872e94bee761777e54c0a/doc/developers/contributing.rst#L426

I did mention it in a dev meeting a month or two ago and didn't hear much objections. Again the idea is not to type everything, but now and then when it helps readability. I think we can keep these in, they can't hurt in any case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in particular I find that without type annotations (and docstrings), it's very difficult to know, what's the expected input and output of,

    def fprime(x):
        return get_gradients(y_true, x).item(0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

In this specific case, I think we should actually let numpy deal with arrays in these functions (comment just below), so we'd need to remove the annotations anyway

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the input is actually not a float, it's a 2d array (with only one element).

I really think we should just let numpy and scipy properly deal with broadcasting here, and only ravel the arrays at the very end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument x of def func(x) is the x0 of @pytest.mark.parametrize('loss, x0...) and is actually a float and not a ndarray. Same for fpime and fprime2.
AFAIU, the newton root finder does prefer scalars, not ndarrays as those have different meaning.

I'll do whatever you prefer here. It's just not clear to me what that is 😏

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, the newton root finder does prefer scalars

That's true, but it does fine with arrays with a single element

We need to wrap x0 and y_true into arrays because that's what the loss methods expect. newton is OK with these. If we're going to recast these 1-element arrays into floats, it's cleaner and easier to do it once at the very end, i.e. by calling .ravel just before the assertions.

Alternatively, we can also go back to just using assert np.allclose. I know we sort of prefer assert_allclose, but at the end of the day, that change is unrelated to this PR.

return loss.pointwise_loss(y_true, x)

def fprime(x):
def fprime(x: np.ndarray) -> np.ndarray:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like we need the annotations anymore. Especially given that these expect arrays with a single element. Newton would break otherwise (I think). Maybe a comment is enough?

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lorentzenchr , one last nit

I still think we don't need the anotations, but I'll merge tomorrow anyway

@NicolasHug NicolasHug changed the title [MRG] TST fix test for binary cross entropy in _hist_gradient_boosting MNT fix test for binary crossentropy in hist-GBDT May 11, 2020
@NicolasHug NicolasHug merged commit a6a20f2 into scikit-learn:master May 11, 2020
@NicolasHug
Copy link
Copy Markdown
Member

thanks @lorentzenchr !

@lorentzenchr lorentzenchr deleted the tst_hgb branch May 11, 2020 19:54
gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants