Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor#7985
Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor#7985maka89 wants to merge 10 commits intoscikit-learn:masterfrom maka89:gpr-sample_noise
Conversation
-Alpha passed to constructor of GaussianProcessRegressor now must be scalar. -In addition to alpha, sample_alpha in fit() is added to the kernel matrix diagonal. None(default),float or array, shape(num_samples,)
amueller
left a comment
There was a problem hiding this comment.
Looks good so far apart from the deprecation. You also need to change / add tests.
| Target values | ||
|
|
||
|
|
||
| sample_alpha : float or array like, shape = (n_samples,), optional |
There was a problem hiding this comment.
Should we allow a float here? What happens if the user specified a float in __init__ and an array here?
There was a problem hiding this comment.
Ok, but the docs don't specify ;)
There was a problem hiding this comment.
It is specified in the comments to the fit() method. Should also specify in constructor?
There was a problem hiding this comment.
Oh sorry I read to quickly, never mind.
| raise ValueError("alpha must be a scalar or an array" | ||
| " with same number of entries as y.(%d != %d)" | ||
| % (self.alpha.shape[0], y.shape[0])) | ||
| if np.iterable(self.alpha): |
There was a problem hiding this comment.
We need to deprecate the old behavior, we can not just remove it.
sklearn/gaussian_process/gpr.py
Outdated
| if sample_alpha is None: | ||
| add_alpha=0.0 | ||
| else: | ||
| if np.iterable(sample_alpha) \ |
There was a problem hiding this comment.
Why did you move the validation of alpha here? I think the previous place was better.
There was a problem hiding this comment.
Ok so moved it back + added deprecation warning
|
So one of the tests failed... What does this mean? |
|
It means you used python2 syntax. Please use print with parenthesis. |
Whitespace in blank line, too long lines etc.
|
Still failing test, one error I don't know what is and the other seems to be related to execution speed of some scripts(I can't possibly have slowed down fit() that much?) Is sample_alpha the best name? (sample_alphas, sample_noise, something else...) |
|
test failure is because of pep8 issues: |
|
Ok, thanks for the help. First time collaborating on a project on github... |
|
@maka89 no worries, thanks for the fix :) |
so sample_alpha is added to diagonal of K in log marginal likelihood function as well.
|
Realized that LML function needs sample_alpha added to diagonal of K as well.... |
|
Quick test again: |
|
test should be in the unit tests, and if you don't trust the current unit tests, you should improve them ;) |
|
Ok, so none of the test_gpr.py functions gave any errors... Don't know what the prosedure with test usually are... should I upload the results? I can make a test function of the script I posted above if you think that is useful. |
|
as you only moved a parameter, I don't think we need additional tests. |
|
Ok, great. |
|
Thanks for the PR. I like alpha (or sample_alpha) being moved to fit. However, I think it is rather confusing that we would now have three places where we can specify noise: alpha in |
|
In case it is helpful, here is the minimal test I proposed in#7975 : I think the last line might need to be modified a little to pass, though. Maybe |
|
I agree removing on removing alpha. I will change the deprecation warning. Is it 0.20 or 0.21 it will be removed ? |
|
It'll be removed in 0.21 I'm not sure about removing |
|
I'm not sure about removing alpha. Is the main thing that it'll be redundant
with the WhiteKernel? sample_alpha is much harder to specify than alpha and
impossible to grid-search.
Same thought. Aren't making the most comon case harder in the name of a
specific case?
|
|
Good point with the grid search... Then you could feed the noise to, say, a WhiteKernel through fit(). I think this would be good because there might be other kernels that take sample-dependent parameters...For instance I have a GP project on my github where the kernel takes input-noise for each datapoint as parameters. It is hard to implement in the current system. This seems like a bigger project, though |
|
yeah and not very aligned with the scikit-learn API, I'm afraid. We could attach more properties to samples but that requires #4497 |
|
The thing with |
|
The optimum solution imo would be if there was a general way to pass arguments
to a kernel through fit(). Or add an extra kernel through fit().
Isn't it making a simple case hard, though?
|
I think this would be beyond the scope of this PR
Which simple case do you mean? Having a fixed, non-sample-specific |
|
Which simple case do you mean? Having a fixed, non-sample-specific alpha?
We could support this also be allowing a scalar value for sample_alpha,
which would be used for each sample.
I don't like this is it is breaking the scikit-learn conventions. One
thing that many people like about scikit-learn is that they don't have to
relearn conventions each time they use a different model.
|
|
Ok, but what should we do in the following situation: someone does not specify |
|
A question or observation while the alpha definition of the GP is in discussion in that PR, I was wondering if it makes sense, in case of a multi-dimensional Y, to allow an alpha per Y column ? If alpha is a vector, its size is checked wrt X rows, to allow a noise definition per observations (or a weight), but what about allowing a matrix definition of alpha ? |
|
So I am pretty sure that for all the other hyperparameters, you use the same value of the hyperparameter for every column of y. Maybe one could defend allowing for matrix alpha, since alpha is being treated sort of different that the other hyperparameters, but it would not be straight forward to implement, so I don't think it is for this PR? For instance, with matrix alpha, fitting a GP, would require number_of_columns_in_y matrix inversions instead of only one. |
Well, maybe not in my mind... I am using multiple Y because I am lazy, in the sense that I am building one model to predict multiple outputs that may not be living in the same space... Certainly not something to be done in that PR, as I said, just a question (maybe not the right place). I'll look closer to the implementation and I'll go back to my Rasmussen copy. |
|
Is there any progress on this PR? I think we did not came to a conclusion regarding the
|
I don't mind the third option you mentioned. There is also something to be said for removing alpha in init() altogether imo. Could just add a fixed WhiteKernel corresponding to alpha=1e-10 to default kernel and specify WHY in the documentation? This would corresond to the current behavior if no kernel is specified by the user, though. If the reason for not removing alpha is grid searching, then consider that (1) One does not typically grid search with GPR. (2) If one is to do so, then it makes little sense to grid search only alpha. Grid searching a scalar length_scale in, say, a RBF kernel AND alpha would be kind of similar to grid searching gamma and C/alpha for kernel ridge/SVR. But the kernel length_scale is unavailable to grid search using the standard tools (if I understand the scikit learn grid search correctly, I haven't tried it). |
When we remove
I agree but we had this discussion already further above. |
|
I meant to change the default kernel only (to 1.0*RBF() + WhiteKernel(1e-10,"fixed") ) and specify why the fixed whitekernel is there in the documentation and recommend adding one. If the user, not having read the documentation, specifies a kernel without a WhiteKernel() added, then he is is free to do so(and probably get numerical problems). Dunno, none of this is optimal, and like I said earlier, I don't mind your solution either. |
|
True, changing the default kernel would also work. The problem would be that a user who does not use the default kernel would always have to add a WhiteKernel manually. It would be more explicit but probably also more error-prone because users would have to remember to do so. Plus: it would break backward-compatibility because everyone would have to adapt his kernels. I think this is not an option unfortunately. |


Reference Issue
Fixes #7975
Fixes #7821
(Duplicate issues)
What does this implement/fix? Explain your changes.
If different samples have different noise-levels(Should be weighted differently), these noise levels should be added in the fit() method. Not in constructor, as it is currently.
Any other comments?
-Alpha passed to constructor of GaussianProcessRegressor now must be scalar.
-In addition to alpha, sample_alpha argument in fit() is added to the kernel matrix diagonal. None(default),float or array, shape(num_samples,)
Quick test:


sample_alpha.zip