MNT fix test for binary crossentropy in hist-GBDT#16691
MNT fix test for binary crossentropy in hist-GBDT#16691NicolasHug merged 10 commits intoscikit-learn:masterfrom
Conversation
| # Subtract a constant term such that the binary cross entropy | ||
| # has its minimum at zero. This only works if 0 < y_true < 1. | ||
| return loss.pointwise_loss(y_true, x) \ | ||
| - loss.pointwise_loss(y_true, logit(y_true)) |
There was a problem hiding this comment.
I feel this is a bit complex.
Can we rather write, `
if isinstance(loss, _LOSSES['binary_crossentropy']):
assert_allclose(loss.pointwise_loss(y_true, optimum),
loss.pointwise_loss(y_true, logit(y_true)))
else:
...in the test directly?
There was a problem hiding this comment.
Good idea. Unfortunately, the newton root finder is allergic to such chances and complains with a no converged message as it can't ever reach a true zero. I would have to change from newton to a minimizer. I thought my change is the smaller one and the advantage of newton is that you also check for gradient and hessian in some way being correct.
What do you think?
| # The argmin of binary_crossentropy for y_true=±1 is ±inf due to logit, | ||
| # cf. "complete separation". Therefore we use 0 < y_true < 1. |
There was a problem hiding this comment.
you mean 0 / 1 in stead of -1 / 1?
Also, why did it work with y_true == 1 then?
There was a problem hiding this comment.
Yes, I meant y = 0 gives -inf, y = 1 gives +inf.
The reason, it worked for y_true = 1 so far was ... luck with numerical precision.
I looked at the solution of newton:
- For
y_true = 1I findoptimum = 35.47761173. Thenpointwise_losscallsnp.logaddexp(0, optimum) - optimumwhich gives0(true for all values ofoptimum > 33.271...). - Then, for
y_true = 0, the loss becomesnp.logaddexp(0, x)which equals zero forx < -745.1..., which the Newton/Halley root finder does not find.
|
I just rebased. Merge conflicts resolved. CI lights are green. PS: As you might note, this is a very subtle, small step towards thinking of logistic regression as a regression on the domain [0, 1]:smirk: |
rth
left a comment
There was a problem hiding this comment.
One question otherwise LGTM, thanks!
| assert np.allclose(loss.pointwise_loss(y_true, optimum), 0) | ||
| assert np.allclose(get_gradients(y_true, optimum), 0, atol=1e-7) | ||
| assert_allclose(loss.inverse_link_function(optimum)[0], y_true) | ||
| assert_allclose(func(optimum)[0], 0, atol=1e-14) |
There was a problem hiding this comment.
Just a question on superficial read: doesn't func return a scalar (I'm a bit confused by the indexing) and if it's doesn't (and returns a ndarray as loss.pointwise_loss) why does using newton minimizer with it works as an objective function ?
Maybe some light type annotations on func could have helped. mypy now runs as part of CI..
There was a problem hiding this comment.
🤔 No, func returns an array with one element. I tried to return scalars everywhere which fails badly because the methods of loss classes often call reshape(-1) internally. (And I have not yet completely figured out why, but this is another story).
I added a comment. I can also add type annotations. Can you provide an short example with ndarrays (of one element)?
There was a problem hiding this comment.
Can you provide an short example with ndarrays (of one element)?
The typing for ndarray dimensions is not too standardized yet (https://github.com/numpy/numpy-stubs), but I think even,
def func(x: np.ndarray) -> np.ndarray:would have been useful as opposed to,
def func(x: np.ndarray) -> float:but anyway it's just a side comment,
|
@rth Thanks for the approval and the annotation hint. I was able to make it work mostly with scalars. I hope it is now easier to read (and your approval still holds, maybe the type annotations are now too much?) |
|
Thanks! The current version looks good to me. |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks @lorentzenchr , a few comments regading readability but LGTM
|
|
||
| def fprime(x): | ||
| return get_gradients(y_true, x) | ||
| def fprime(x: float) -> float: |
There was a problem hiding this comment.
maybe add these as comments for now
I guess we need a more global decision on whether to use type annotations
There was a problem hiding this comment.
I guess we need a more global decision on whether to use type annotations
I think the current situation is outlined in https://github.com/scikit-learn/scikit-learn/blame/fb76de72e7560aaa739872e94bee761777e54c0a/doc/developers/contributing.rst#L426
I did mention it in a dev meeting a month or two ago and didn't hear much objections. Again the idea is not to type everything, but now and then when it helps readability. I think we can keep these in, they can't hurt in any case.
There was a problem hiding this comment.
Here in particular I find that without type annotations (and docstrings), it's very difficult to know, what's the expected input and output of,
def fprime(x):
return get_gradients(y_true, x).item(0)There was a problem hiding this comment.
Ok.
In this specific case, I think we should actually let numpy deal with arrays in these functions (comment just below), so we'd need to remove the annotations anyway
There was a problem hiding this comment.
Also, the input is actually not a float, it's a 2d array (with only one element).
I really think we should just let numpy and scipy properly deal with broadcasting here, and only ravel the arrays at the very end.
There was a problem hiding this comment.
The argument x of def func(x) is the x0 of @pytest.mark.parametrize('loss, x0...) and is actually a float and not a ndarray. Same for fpime and fprime2.
AFAIU, the newton root finder does prefer scalars, not ndarrays as those have different meaning.
I'll do whatever you prefer here. It's just not clear to me what that is 😏
There was a problem hiding this comment.
AFAIU, the newton root finder does prefer scalars
That's true, but it does fine with arrays with a single element
We need to wrap x0 and y_true into arrays because that's what the loss methods expect. newton is OK with these. If we're going to recast these 1-element arrays into floats, it's cleaner and easier to do it once at the very end, i.e. by calling .ravel just before the assertions.
Alternatively, we can also go back to just using assert np.allclose. I know we sort of prefer assert_allclose, but at the end of the day, that change is unrelated to this PR.
| return loss.pointwise_loss(y_true, x) | ||
|
|
||
| def fprime(x): | ||
| def fprime(x: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
I don't feel like we need the annotations anymore. Especially given that these expect arrays with a single element. Newton would break otherwise (I think). Maybe a comment is enough?
NicolasHug
left a comment
There was a problem hiding this comment.
thanks @lorentzenchr , one last nit
I still think we don't need the anotations, but I'll merge tomorrow anyway
|
thanks @lorentzenchr ! |
What does this implement/fix? Explain your changes.
This fixes a test for the binary cross entropy loss function in the module for
HistGradientBoostingClassifier.Any other comments?
See added comments in commits.