Skip to content

Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor#7985

Closed
maka89 wants to merge 10 commits intoscikit-learn:masterfrom
maka89:gpr-sample_noise
Closed

Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor#7985
maka89 wants to merge 10 commits intoscikit-learn:masterfrom
maka89:gpr-sample_noise

Conversation

@maka89
Copy link
Copy Markdown

@maka89 maka89 commented Dec 5, 2016

Reference Issue

Fixes #7975
Fixes #7821
(Duplicate issues)

What does this implement/fix? Explain your changes.

If different samples have different noise-levels(Should be weighted differently), these noise levels should be added in the fit() method. Not in constructor, as it is currently.

Any other comments?

-Alpha passed to constructor of GaussianProcessRegressor now must be scalar.
-In addition to alpha, sample_alpha argument in fit() is added to the kernel matrix diagonal. None(default),float or array, shape(num_samples,)

Quick test:
figure_2
figure_1
sample_alpha.zip

-Alpha passed to constructor of GaussianProcessRegressor now must be scalar.
-In addition to alpha, sample_alpha in fit() is added  to the kernel matrix diagonal.  None(default),float or array, shape(num_samples,)
Copy link
Copy Markdown
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far apart from the deprecation. You also need to change / add tests.

Target values


sample_alpha : float or array like, shape = (n_samples,), optional
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow a float here? What happens if the user specified a float in __init__ and an array here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The float is added to the array

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but the docs don't specify ;)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is specified in the comments to the fit() method. Should also specify in constructor?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I read to quickly, never mind.

raise ValueError("alpha must be a scalar or an array"
" with same number of entries as y.(%d != %d)"
% (self.alpha.shape[0], y.shape[0]))
if np.iterable(self.alpha):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to deprecate the old behavior, we can not just remove it.

if sample_alpha is None:
add_alpha=0.0
else:
if np.iterable(sample_alpha) \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move the validation of alpha here? I think the previous place was better.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so moved it back + added deprecation warning

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 5, 2016

So one of the tests failed... What does this mean?

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 5, 2016

It means you used python2 syntax. Please use print with parenthesis.

Whitespace in blank line, too long lines etc.
@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 6, 2016

Still failing test, one error I don't know what is and the other seems to be related to execution speed of some scripts(I can't possibly have slowed down fit() that much?)

Is sample_alpha the best name? (sample_alphas, sample_noise, something else...)

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 6, 2016

test failure is because of pep8 issues:
https://travis-ci.org/scikit-learn/scikit-learn/jobs/181514957#L488

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 6, 2016

Ok, thanks for the help. First time collaborating on a project on github...

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 6, 2016

@maka89 no worries, thanks for the fix :)

@maka89 maka89 changed the title Sample dependent alpha parameter moved to fit() Sample-dependent alpha parameter moved to fit() in GaussianProcessRegressor Dec 6, 2016
so sample_alpha is added to diagonal of K in log marginal likelihood function as well.
@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 7, 2016

Realized that LML function needs sample_alpha added to diagonal of K as well....

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 7, 2016

Quick test again:

figure_1-1
figure_2-1
sample_alpha.zip

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 7, 2016

test should be in the unit tests, and if you don't trust the current unit tests, you should improve them ;)

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 7, 2016

Ok, so none of the test_gpr.py functions gave any errors... Don't know what the prosedure with test usually are... should I upload the results? I can make a test function of the script I posted above if you think that is useful.

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 7, 2016

as you only moved a parameter, I don't think we need additional tests.

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 7, 2016

Ok, great.

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Dec 9, 2016

Thanks for the PR. I like alpha (or sample_alpha) being moved to fit. However, I think it is rather confusing that we would now have three places where we can specify noise: alpha in __init__(), sample_alpha in fit(), and the WhiteKernel. In particular, adding alpha and sample_alpha in fit() could be confusing to users. I would prefer removing alpha (after deprecation) and using 1e-10 rather than 0 as default for sample_alpha.

@aflaxman
Copy link
Copy Markdown
Contributor

aflaxman commented Dec 9, 2016

In case it is helpful, here is the minimal test I proposed in#7975 :

X = [[0], [1]]
y = [.5, .5]
se = np.array([.1, .5])

kernel = sklearn.gaussian_process.kernels.Matern(length_scale=1.0, nu=1.5)
gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=kernel, optimizer=None)

gp.fit(X, y, sample_alpha=se**2)

y_mean, y_std = gp.predict(X, return_std=True)
assert y_std[0] <= y_std[1]/5.0

I think the last line might need to be modified a little to pass, though. Maybe y_std[0] will be slightly larger than y_std[1]/5.0.

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 9, 2016

I agree removing on removing alpha. I will change the deprecation warning. Is it 0.20 or 0.21 it will be removed ?
Also agree that sample_weights is a bit confusing... Are we sticking with sample_alpha for now?

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 9, 2016

It'll be removed in 0.21

I'm not sure about removing alpha. Is the main thing that it'll be redundant with the WhiteKernel? sample_alpha is much harder to specify than alpha and impossible to grid-search.

@GaelVaroquaux
Copy link
Copy Markdown
Member

GaelVaroquaux commented Dec 9, 2016 via email

@maka89
Copy link
Copy Markdown
Author

maka89 commented Dec 9, 2016

Good point with the grid search...
Yes, putting alpha=0.1 equals WhiteKernel(0.1,"fixed")...
The optimum solution imo would be if there was a general way to pass arguments to a kernel through fit(). Or add an extra kernelfunction through fit().

Then you could feed the noise to, say, a WhiteKernel through fit(). I think this would be good because there might be other kernels that take sample-dependent parameters...For instance I have a GP project on my github where the kernel takes input-noise for each datapoint as parameters. It is hard to implement in the current system.

This seems like a bigger project, though

@amueller
Copy link
Copy Markdown
Member

amueller commented Dec 9, 2016

yeah and not very aligned with the scikit-learn API, I'm afraid. We could attach more properties to samples but that requires #4497

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Dec 10, 2016

The thing with alpha though is that you would not grid-search it. If you use a WhiteKernel as part of your kernel you can optimize the noise level (i.e., alpha) along with all other kernel parameters like length-scales etc. by optimizing the log-marginal-likelihood. The main point of treating the noise level differently than other kernel parameters was that it can be sample-specific. But as noted in the related issues, it is actually much cleaner to specify a sample-specific alpha in fit(). If we change that, I don't see why we should still allow specifying a non-sample-specific noise level in __init__() if that is perfectly possible via the WhiteKernel (except for consistency with Ridge). "There should be one-- and preferably only one --obvious way to do it." ;-)

@GaelVaroquaux
Copy link
Copy Markdown
Member

GaelVaroquaux commented Dec 10, 2016 via email

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Dec 10, 2016

The optimum solution imo would be if there was a general way to pass arguments to a kernel through fit(). Or add an extra kernel through fit().

I think this would be beyond the scope of this PR

Isn't it making a simple case hard, though?

Which simple case do you mean? Having a fixed, non-sample-specific alpha? We could support this also be allowing a scalar value for sample_alpha, which would be used for each sample.

@GaelVaroquaux
Copy link
Copy Markdown
Member

GaelVaroquaux commented Dec 10, 2016 via email

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Dec 10, 2016

Ok, but what should we do in the following situation: someone does not specify alphain _init__() so its default value of 1e-10 is used. When fitting, the user sets sample_alpha to [1e-10, 1e-12]. The implementation in this PR would use alpha + sample_alpha internally, thus effectively [2e-10, 1e-10 + 1e-12] is used. I doubt that this is what the user intended. Changing the default of alpha to 0 would be problematic for numerical reasons. I am not sure how to resolve this situation.

@fdeheeger
Copy link
Copy Markdown

A question or observation while the alpha definition of the GP is in discussion in that PR, I was wondering if it makes sense, in case of a multi-dimensional Y, to allow an alpha per Y column ?

If alpha is a vector, its size is checked wrt X rows, to allow a noise definition per observations (or a weight), but what about allowing a matrix definition of alpha ?

@maka89
Copy link
Copy Markdown
Author

maka89 commented Jan 26, 2017

So I am pretty sure that for all the other hyperparameters, you use the same value of the hyperparameter for every column of y. Maybe one could defend allowing for matrix alpha, since alpha is being treated sort of different that the other hyperparameters, but it would not be straight forward to implement, so I don't think it is for this PR? For instance, with matrix alpha, fitting a GP, would require number_of_columns_in_y matrix inversions instead of only one.

@fdeheeger
Copy link
Copy Markdown

So I am pretty sure that for all the other hyperparameters, you use the same value of the hyperparameter for every column of y.

Well, maybe not in my mind... I am using multiple Y because I am lazy, in the sense that I am building one model to predict multiple outputs that may not be living in the same space...
So, in the current implementation, using multiple outputs is closer to multiple observations of the same Y than multiple Ys living in their own space ?

Certainly not something to be done in that PR, as I said, just a question (maybe not the right place). I'll look closer to the implementation and I'll go back to my Rasmussen copy.

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Feb 28, 2017

Is there any progress on this PR? I think we did not came to a conclusion regarding the alpha vs. sample_alpha issue. Based on @GaelVaroquaux judgement (if I get it correctly, correct me if I get you wrong), we should both allow a scalar alpha in __init__ (the easy case) and a vector-valued sample_alpha. sample_alpha should default to None (according to sklearn conventions). We cannot let alpha default to 0 at the same time, because then the defaults would result in numerical instabilities. Thus, we need to keep the default alpha=1e-10. What should we do when sample_alpha is set by the user:

  • add alpha and sample_alpha (current behavior, but probably not what user intended)
  • use sample_alpha and ignore alpha (probably what the user intended (?) if he did not set alpha explicitly but kept the default. But problematic if alpha was set explicitly.)
  • raise an Error (relatively inconvenient because the user has to set alpha explicitly to 0 when specifying sample_alpha but at least fully transparent and avoids mismatch between what happens and what user intended)
    I would prefer the last. What are the other opinions?

@maka89
Copy link
Copy Markdown
Author

maka89 commented Mar 1, 2017

Is there any progress on this PR?

I don't mind the third option you mentioned.

There is also something to be said for removing alpha in init() altogether imo. Could just add a fixed WhiteKernel corresponding to alpha=1e-10 to default kernel and specify WHY in the documentation? This would corresond to the current behavior if no kernel is specified by the user, though.

If the reason for not removing alpha is grid searching, then consider that (1) One does not typically grid search with GPR. (2) If one is to do so, then it makes little sense to grid search only alpha. Grid searching a scalar length_scale in, say, a RBF kernel AND alpha would be kind of similar to grid searching gamma and C/alpha for kernel ridge/SVR. But the kernel length_scale is unavailable to grid search using the standard tools (if I understand the scikit learn grid search correctly, I haven't tried it).

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Mar 2, 2017

There is also something to be said for removing alpha in init() altogether imo. Could just add a fixed WhiteKernel corresponding to alpha=1e-10 to default kernel and specify WHY in the documentation? This would corresond to the current behavior if no kernel is specified by the user, though.

When we remove alpha and always add a WhiteKernel, the problem would be that we could not set alpha=0.

If the reason for not removing alpha is grid searching, then consider that (1) One does not typically grid search with GPR. (2) If one is to do so, then it makes little sense to grid search only alpha. Grid searching a scalar length_scale in, say, a RBF kernel AND alpha would be kind of similar to grid searching gamma and C/alpha for kernel ridge/SVR. But the kernel length_scale is unavailable to grid search using the standard tools (if I understand the scikit learn grid search correctly, I haven't tried it).

I agree but we had this discussion already further above.

@maka89
Copy link
Copy Markdown
Author

maka89 commented Mar 2, 2017

I meant to change the default kernel only (to 1.0*RBF() + WhiteKernel(1e-10,"fixed") ) and specify why the fixed whitekernel is there in the documentation and recommend adding one. If the user, not having read the documentation, specifies a kernel without a WhiteKernel() added, then he is is free to do so(and probably get numerical problems). Dunno, none of this is optimal, and like I said earlier, I don't mind your solution either.

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Mar 6, 2017

True, changing the default kernel would also work. The problem would be that a user who does not use the default kernel would always have to add a WhiteKernel manually. It would be more explicit but probably also more error-prone because users would have to remember to do so. Plus: it would break backward-compatibility because everyone would have to adapt his kernels. I think this is not an option unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A possible alternative to alpha at fit-time for GaussianProcessRegressor Allow alpha in GaussianProcessRegressor at fit time?

7 participants