Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor by maka89 · Pull Request #7985 · scikit-learn/scikit-learn

maka89 · 2016-12-05T22:31:27Z

Reference Issue

Fixes #7975
Fixes #7821
(Duplicate issues)

What does this implement/fix? Explain your changes.

If different samples have different noise-levels(Should be weighted differently), these noise levels should be added in the fit() method. Not in constructor, as it is currently.

Any other comments?

-Alpha passed to constructor of GaussianProcessRegressor now must be scalar.
-In addition to alpha, sample_alpha argument in fit() is added to the kernel matrix diagonal. None(default),float or array, shape(num_samples,)

Quick test:

sample_alpha.zip

-Alpha passed to constructor of GaussianProcessRegressor now must be scalar. -In addition to alpha, sample_alpha in fit() is added to the kernel matrix diagonal. None(default),float or array, shape(num_samples,)

amueller

Looks good so far apart from the deprecation. You also need to change / add tests.

amueller · 2016-12-05T22:32:43Z

sklearn/gaussian_process/gpr.py

            Target values
-
+
+        sample_alpha : float or array like, shape = (n_samples,), optional


Should we allow a float here? What happens if the user specified a float in __init__ and an array here?

The float is added to the array

Ok, but the docs don't specify ;)

It is specified in the comments to the fit() method. Should also specify in constructor?

Oh sorry I read to quickly, never mind.

amueller · 2016-12-05T22:33:03Z

sklearn/gaussian_process/gpr.py

-                raise ValueError("alpha must be a scalar or an array"
-                                 " with same number of entries as y.(%d != %d)"
-                                 % (self.alpha.shape[0], y.shape[0]))
+        if np.iterable(self.alpha):


We need to deprecate the old behavior, we can not just remove it.

amueller · 2016-12-05T22:34:06Z

sklearn/gaussian_process/gpr.py

+        if sample_alpha is None:
+            add_alpha=0.0
+        else:
+            if np.iterable(sample_alpha) \


Why did you move the validation of alpha here? I think the previous place was better.

Ok so moved it back + added deprecation warning

maka89 · 2016-12-05T22:55:51Z

So one of the tests failed... What does this mean?

amueller · 2016-12-05T23:01:18Z

It means you used python2 syntax. Please use print with parenthesis.

Whitespace in blank line, too long lines etc.

maka89 · 2016-12-06T00:36:15Z

Still failing test, one error I don't know what is and the other seems to be related to execution speed of some scripts(I can't possibly have slowed down fit() that much?)

Is sample_alpha the best name? (sample_alphas, sample_noise, something else...)

amueller · 2016-12-06T19:27:44Z

test failure is because of pep8 issues:
https://travis-ci.org/scikit-learn/scikit-learn/jobs/181514957#L488

maka89 · 2016-12-06T20:23:21Z

Ok, thanks for the help. First time collaborating on a project on github...

amueller · 2016-12-06T20:27:52Z

@maka89 no worries, thanks for the fix :)

so sample_alpha is added to diagonal of K in log marginal likelihood function as well.

maka89 · 2016-12-07T18:07:24Z

Realized that LML function needs sample_alpha added to diagonal of K as well....

maka89 · 2016-12-07T18:14:11Z

Quick test again:

sample_alpha.zip

amueller · 2016-12-07T18:17:52Z

test should be in the unit tests, and if you don't trust the current unit tests, you should improve them ;)

maka89 · 2016-12-07T19:05:28Z

Ok, so none of the test_gpr.py functions gave any errors... Don't know what the prosedure with test usually are... should I upload the results? I can make a test function of the script I posted above if you think that is useful.

amueller · 2016-12-07T19:15:19Z

as you only moved a parameter, I don't think we need additional tests.

maka89 · 2016-12-07T19:35:36Z

Ok, great.

jmetzen · 2016-12-09T14:09:18Z

Thanks for the PR. I like alpha (or sample_alpha) being moved to fit. However, I think it is rather confusing that we would now have three places where we can specify noise: alpha in __init__(), sample_alpha in fit(), and the WhiteKernel. In particular, adding alpha and sample_alpha in fit() could be confusing to users. I would prefer removing alpha (after deprecation) and using 1e-10 rather than 0 as default for sample_alpha.

aflaxman · 2016-12-09T14:24:28Z

In case it is helpful, here is the minimal test I proposed in#7975 :

X = [[0], [1]]
y = [.5, .5]
se = np.array([.1, .5])

kernel = sklearn.gaussian_process.kernels.Matern(length_scale=1.0, nu=1.5)
gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=kernel, optimizer=None)

gp.fit(X, y, sample_alpha=se**2)

y_mean, y_std = gp.predict(X, return_std=True)
assert y_std[0] <= y_std[1]/5.0

I think the last line might need to be modified a little to pass, though. Maybe y_std[0] will be slightly larger than y_std[1]/5.0.

maka89 · 2016-12-09T15:35:22Z

I agree removing on removing alpha. I will change the deprecation warning. Is it 0.20 or 0.21 it will be removed ?
Also agree that sample_weights is a bit confusing... Are we sticking with sample_alpha for now?

amueller · 2016-12-09T16:36:39Z

It'll be removed in 0.21

I'm not sure about removing alpha. Is the main thing that it'll be redundant with the WhiteKernel? sample_alpha is much harder to specify than alpha and impossible to grid-search.

GaelVaroquaux · 2016-12-09T16:52:00Z

I'm not sure about removing alpha. Is the main thing that it'll be redundant with the WhiteKernel? sample_alpha is much harder to specify than alpha and impossible to grid-search.

Same thought. Aren't making the most comon case harder in the name of a specific case?

maka89 · 2016-12-09T17:28:58Z

Good point with the grid search...
Yes, putting alpha=0.1 equals WhiteKernel(0.1,"fixed")...
The optimum solution imo would be if there was a general way to pass arguments to a kernel through fit(). Or add an extra kernelfunction through fit().

Then you could feed the noise to, say, a WhiteKernel through fit(). I think this would be good because there might be other kernels that take sample-dependent parameters...For instance I have a GP project on my github where the kernel takes input-noise for each datapoint as parameters. It is hard to implement in the current system.

This seems like a bigger project, though

amueller · 2016-12-09T17:32:00Z

yeah and not very aligned with the scikit-learn API, I'm afraid. We could attach more properties to samples but that requires #4497

jmetzen · 2016-12-10T11:42:18Z

The thing with alpha though is that you would not grid-search it. If you use a WhiteKernel as part of your kernel you can optimize the noise level (i.e., alpha) along with all other kernel parameters like length-scales etc. by optimizing the log-marginal-likelihood. The main point of treating the noise level differently than other kernel parameters was that it can be sample-specific. But as noted in the related issues, it is actually much cleaner to specify a sample-specific alpha in fit(). If we change that, I don't see why we should still allow specifying a non-sample-specific noise level in __init__() if that is perfectly possible via the WhiteKernel (except for consistency with Ridge). "There should be one-- and preferably only one --obvious way to do it." ;-)

GaelVaroquaux · 2016-12-10T11:43:54Z

The optimum solution imo would be if there was a general way to pass arguments to a kernel through fit(). Or add an extra kernel through fit().

Isn't it making a simple case hard, though?

jmetzen · 2016-12-10T11:49:00Z

The optimum solution imo would be if there was a general way to pass arguments to a kernel through fit(). Or add an extra kernel through fit().

I think this would be beyond the scope of this PR

Isn't it making a simple case hard, though?

Which simple case do you mean? Having a fixed, non-sample-specific alpha? We could support this also be allowing a scalar value for sample_alpha, which would be used for each sample.

GaelVaroquaux · 2016-12-10T12:30:21Z

Which simple case do you mean? Having a fixed, non-sample-specific alpha? We could support this also be allowing a scalar value for sample_alpha, which would be used for each sample.

I don't like this is it is breaking the scikit-learn conventions. One thing that many people like about scikit-learn is that they don't have to relearn conventions each time they use a different model.

jmetzen · 2016-12-10T12:49:50Z

Ok, but what should we do in the following situation: someone does not specify alphain _init__() so its default value of 1e-10 is used. When fitting, the user sets sample_alpha to [1e-10, 1e-12]. The implementation in this PR would use alpha + sample_alpha internally, thus effectively [2e-10, 1e-10 + 1e-12] is used. I doubt that this is what the user intended. Changing the default of alpha to 0 would be problematic for numerical reasons. I am not sure how to resolve this situation.

fdeheeger · 2017-01-26T14:46:42Z

A question or observation while the alpha definition of the GP is in discussion in that PR, I was wondering if it makes sense, in case of a multi-dimensional Y, to allow an alpha per Y column ?

If alpha is a vector, its size is checked wrt X rows, to allow a noise definition per observations (or a weight), but what about allowing a matrix definition of alpha ?

maka89 · 2017-01-26T15:23:43Z

So I am pretty sure that for all the other hyperparameters, you use the same value of the hyperparameter for every column of y. Maybe one could defend allowing for matrix alpha, since alpha is being treated sort of different that the other hyperparameters, but it would not be straight forward to implement, so I don't think it is for this PR? For instance, with matrix alpha, fitting a GP, would require number_of_columns_in_y matrix inversions instead of only one.

fdeheeger · 2017-01-26T19:04:25Z

So I am pretty sure that for all the other hyperparameters, you use the same value of the hyperparameter for every column of y.

Well, maybe not in my mind... I am using multiple Y because I am lazy, in the sense that I am building one model to predict multiple outputs that may not be living in the same space...
So, in the current implementation, using multiple outputs is closer to multiple observations of the same Y than multiple Ys living in their own space ?

Certainly not something to be done in that PR, as I said, just a question (maybe not the right place). I'll look closer to the implementation and I'll go back to my Rasmussen copy.

jmetzen · 2017-02-28T18:40:51Z

Is there any progress on this PR? I think we did not came to a conclusion regarding the alpha vs. sample_alpha issue. Based on @GaelVaroquaux judgement (if I get it correctly, correct me if I get you wrong), we should both allow a scalar alpha in __init__ (the easy case) and a vector-valued sample_alpha. sample_alpha should default to None (according to sklearn conventions). We cannot let alpha default to 0 at the same time, because then the defaults would result in numerical instabilities. Thus, we need to keep the default alpha=1e-10. What should we do when sample_alpha is set by the user:

add alpha and sample_alpha (current behavior, but probably not what user intended)
use sample_alpha and ignore alpha (probably what the user intended (?) if he did not set alpha explicitly but kept the default. But problematic if alpha was set explicitly.)
raise an Error (relatively inconvenient because the user has to set alpha explicitly to 0 when specifying sample_alpha but at least fully transparent and avoids mismatch between what happens and what user intended)
I would prefer the last. What are the other opinions?

maka89 · 2017-03-01T15:34:02Z

Is there any progress on this PR?

I don't mind the third option you mentioned.

There is also something to be said for removing alpha in init() altogether imo. Could just add a fixed WhiteKernel corresponding to alpha=1e-10 to default kernel and specify WHY in the documentation? This would corresond to the current behavior if no kernel is specified by the user, though.

If the reason for not removing alpha is grid searching, then consider that (1) One does not typically grid search with GPR. (2) If one is to do so, then it makes little sense to grid search only alpha. Grid searching a scalar length_scale in, say, a RBF kernel AND alpha would be kind of similar to grid searching gamma and C/alpha for kernel ridge/SVR. But the kernel length_scale is unavailable to grid search using the standard tools (if I understand the scikit learn grid search correctly, I haven't tried it).

jmetzen · 2017-03-02T18:33:50Z

There is also something to be said for removing alpha in init() altogether imo. Could just add a fixed WhiteKernel corresponding to alpha=1e-10 to default kernel and specify WHY in the documentation? This would corresond to the current behavior if no kernel is specified by the user, though.

When we remove alpha and always add a WhiteKernel, the problem would be that we could not set alpha=0.

If the reason for not removing alpha is grid searching, then consider that (1) One does not typically grid search with GPR. (2) If one is to do so, then it makes little sense to grid search only alpha. Grid searching a scalar length_scale in, say, a RBF kernel AND alpha would be kind of similar to grid searching gamma and C/alpha for kernel ridge/SVR. But the kernel length_scale is unavailable to grid search using the standard tools (if I understand the scikit learn grid search correctly, I haven't tried it).

I agree but we had this discussion already further above.

maka89 · 2017-03-02T19:33:03Z

I meant to change the default kernel only (to 1.0*RBF() + WhiteKernel(1e-10,"fixed") ) and specify why the fixed whitekernel is there in the documentation and recommend adding one. If the user, not having read the documentation, specifies a kernel without a WhiteKernel() added, then he is is free to do so(and probably get numerical problems). Dunno, none of this is optimal, and like I said earlier, I don't mind your solution either.

jmetzen · 2017-03-06T18:25:03Z

True, changing the default kernel would also work. The problem would be that a user who does not use the default kernel would always have to add a WhiteKernel manually. It would be more explicit but probably also more error-prone because users would have to remember to do so. Plus: it would break backward-compatibility because everyone would have to adapt his kernels. I think this is not an option unfortunately.

Sample dependent alpha parameter moved to fit()

600abf5

-Alpha passed to constructor of GaussianProcessRegressor now must be scalar. -In addition to alpha, sample_alpha in fit() is added to the kernel matrix diagonal. None(default),float or array, shape(num_samples,)

amueller reviewed Dec 5, 2016

View reviewed changes

Update gpr.py

8bc778d

maka89 added 2 commits December 6, 2016 00:06

Python 3 style print in deprecation warning

098c0cc

Some syntax fixes

1481ac7

Whitespace in blank line, too long lines etc.

One more whitespace on blank line

183460b

maka89 changed the title ~~Sample dependent alpha parameter moved to fit()~~ Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor Dec 6, 2016

add_alpha -> self.sample_alpha

489ac4d

so sample_alpha is added to diagonal of K in log marginal likelihood function as well.

maka89 added 2 commits December 7, 2016 19:34

Trailing whitespace

b9da471

Deprecation warning

86f4bcb

Fixed indent

ffab89b

aflaxman mentioned this pull request Dec 8, 2016

A possible alternative to alpha at fit-time for GaussianProcessRegressor #7975

Open

Changed deprecation warning

24ca400

maka89 mentioned this pull request Dec 20, 2016

Allow alpha in GaussianProcessRegressor at fit time? #7821

Open

glouppe mentioned this pull request Jan 21, 2017

Tell a point with its uncertainty scikit-optimize/scikit-optimize#300

Open

amueller added the Stalled label Aug 5, 2019

github-actions bot added the module:gaussian_process label Mar 2, 2020

adrinjalali closed this Jan 22, 2021

adrinjalali deleted the branch scikit-learn:master January 22, 2021 10:54

wen-zheng mentioned this pull request Aug 16, 2022

FEA Add sample variance support for GaussianProcessRegressor.fit #24188

Open

		Target values


		sample_alpha : float or array like, shape = (n_samples,), optional

Uh oh!

Conversation

maka89 commented Dec 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maka89 commented Dec 5, 2016

Uh oh!

amueller commented Dec 5, 2016

Uh oh!

maka89 commented Dec 6, 2016

Uh oh!

amueller commented Dec 6, 2016

Uh oh!

maka89 commented Dec 6, 2016

Uh oh!

amueller commented Dec 6, 2016

Uh oh!

maka89 commented Dec 7, 2016

Uh oh!

maka89 commented Dec 7, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

maka89 commented Dec 7, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

maka89 commented Dec 7, 2016

Uh oh!

jmetzen commented Dec 9, 2016

Uh oh!

aflaxman commented Dec 9, 2016

Uh oh!

maka89 commented Dec 9, 2016

Uh oh!

amueller commented Dec 9, 2016

Uh oh!

GaelVaroquaux commented Dec 9, 2016 via email

Uh oh!

maka89 commented Dec 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Dec 9, 2016

Uh oh!

jmetzen commented Dec 10, 2016

Uh oh!

GaelVaroquaux commented Dec 10, 2016 via email

Uh oh!

jmetzen commented Dec 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented Dec 10, 2016 via email

Uh oh!

jmetzen commented Dec 10, 2016

Uh oh!

fdeheeger commented Jan 26, 2017

Uh oh!

maka89 commented Jan 26, 2017

maka89 commented Dec 5, 2016 •

edited

Loading

maka89 commented Dec 9, 2016 •

edited

Loading

jmetzen commented Dec 10, 2016 •

edited

Loading

maka89 commented Mar 1, 2017 •

edited

Loading

maka89 commented Mar 2, 2017 •

edited

Loading