[MRG] Adding LinearKernel to GaussianProcesses by dalmia · Pull Request #8373 · scikit-learn/scikit-learn

dalmia · 2017-02-16T14:45:19Z

Reference Issue

Fixes #8359

What does this implement/fix? Explain your changes.

This adds LinearKernel to GaussianProcesses as per the link mentioned in the issue thread.

Any other comments?

Implementation of sigma_v and sigma_b
Add diag(X)
Add documentation
Implementation of c

codecov · 2017-02-16T17:57:17Z

Codecov Report

Merging #8373 into master will increase coverage by <.01%.
The diff coverage is 97.05%.

@@            Coverage Diff             @@
##           master    #8373      +/-   ##
==========================================
+ Coverage   94.75%   94.75%   +<.01%     
==========================================
  Files         342      342              
  Lines       60801    60835      +34     
==========================================
+ Hits        57609    57642      +33     
- Misses       3192     3193       +1

Impacted Files	Coverage Δ
sklearn/gaussian_process/tests/test_kernels.py	`98.83% <ø> (ø)`	✅
sklearn/gaussian_process/kernels.py	`91.86% <97.05%> (+0.32%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c1bf1...6bdc3fe. Read the comment docs.

jmetzen · 2017-02-17T19:39:40Z

I am not totally convinced that we should add the functionality as a new kernel (LinearKernel) rather than extending the existing DotProduct kernel. Or, alternatively, change the implementation of DotProduct such that it is a specialization of the new LinearKernel. We should try to avoid code duplication here.

dalmia · 2017-02-17T19:42:30Z

I agree. For the most part, this seems an extension to the DotProduct kernel where we can add sigma_v and c, defaulting them to 1 and 0 respectively.

dalmia · 2017-02-18T11:12:41Z

@jmetzen Could you please explain as to why the gradient of K(X, X') with respect to sigma_0 in DotProduct kernel is not 2 * sigma_0 but 2 * sigma_0 ** 2?

jmetzen · 2017-02-18T12:22:25Z

This is a good question. It looks like a bug. But the unittest in test_kernels.py, which compares the implemented gradient with its numerical approximation does not fail. When I change the gradient to 2 * sigma_0 it does fail, however. A similar situation exists for the ConstantKernel. I don't remember why this should be the case.

dalmia · 2017-02-18T12:29:13Z

Yes, I observed the same thing when I was working on the eval_gradient part of LinearKernel. Maybe @MechCoder can help us here? :)

jmetzen · 2017-02-18T13:12:38Z

Ah, I remember. It is because theta is actually always the logarithm of the hyperparameter values. And the gradient of the kernel is taken with respect to theta and not with respect to the hyperparameter value like sigma_0. Let us consider the DotProduct kernel as an example. Writing as a function of theta instead of sigma_0 with sigma_0 = exp(theta) gives us:
k(x_i, x_j) = sigma_0 ^ 2 + x_i \cdot x_j = exp(theta) ^ 2 + x_i \cdot x_j = exp(2*theta) + x_i \cdot x_j
Now the gradient with respect to theta is :
2*exp(2*theta) = 2*exp(theta)^2 = 2*\sigma_0^2
So the implemented gradient is correct (same for the ConstantKernel). However, the docstring is wrong when it says "The gradient of the kernel k(X, X) with respect to the hyperparameter of the kernel". It should be "The gradient of the kernel k(X, X) with respect to the log-transformed hyperparameter of the kernel" or something like that.

As an addendum: Why is theta the log-transformed value? From the docstring of the theta property:
"Note that theta are typically the log-transformed values of the
kernel's hyperparameters as this representation of the search space
is more amenable for hyperparameter search, as hyperparameters like
length-scales naturally live on a log-scale."

We should create a separate issue in which we can make the documentation and the comments more explicit about this.

dalmia · 2017-02-18T18:21:36Z

Thank you so much for the clear explanation. Finally, this makes sense now and I can proceed with adding the c parameter to it. But you are also right in that, reading the docstring this doesn't seem very obvious and needs to be documented better.

dalmia · 2017-02-19T08:04:15Z

Just reporting, there is a bug in the tests. It assumes that:

len(kernel.hyperparameters) == kernel.theta.size

For the LinearKernel case when c.ndim == X.ndim, suppose c.ndim = 2. Now:

>>>len(kernel.hyperparameters)
3
>>>kernel.theta.size
4

codecov-io · 2017-02-19T10:35:44Z

Codecov Report

Merging #8373 into master will increase coverage by <.01%.
The diff coverage is 97.05%.

@@            Coverage Diff             @@
##           master    #8373      +/-   ##
==========================================
+ Coverage   94.75%   94.75%   +<.01%     
==========================================
  Files         342      342              
  Lines       60801    60835      +34     
==========================================
+ Hits        57609    57642      +33     
- Misses       3192     3193       +1

Impacted Files	Coverage Δ
sklearn/gaussian_process/tests/test_kernels.py	`98.83% <ø> (ø)`	✅
sklearn/gaussian_process/kernels.py	`91.86% <97.05%> (+0.32%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c1bf1...078bba7. Read the comment docs.

codecov-io · 2017-02-19T10:35:44Z

Codecov Report

Merging #8373 into master will decrease coverage by 1.44%.
The diff coverage is 96.96%.

@@            Coverage Diff             @@
##           master    #8373      +/-   ##
==========================================
- Coverage   96.19%   94.75%   -1.45%     
==========================================
  Files         348      342       -6     
  Lines       64645    60895    -3750     
==========================================
- Hits        62187    57700    -4487     
- Misses       2458     3195     +737

Impacted Files	Coverage Δ
sklearn/gaussian_process/tests/test_kernels.py	`98.94% <100%> (-1.06%)`	⬇️
sklearn/gaussian_process/kernels.py	`91.94% <95.31%> (+0.39%)`	⬆️
sklearn/feature_extraction/tests/test_image.py	`4.78% <0%> (-95.22%)`	⬇️
sklearn/feature_extraction/image.py	`58.16% <0%> (-41.84%)`	⬇️
sklearn/utils/arpack.py	`42.5% <0%> (-32.5%)`	⬇️
sklearn/utils/random.py	`59.29% <0%> (-32.2%)`	⬇️
sklearn/datasets/tests/test_kddcup99.py	`29.16% <0%> (-10.84%)`	⬇️
sklearn/manifold/mds.py	`84.46% <0%> (-9.71%)`	⬇️
sklearn/utils/tests/test_estimator_checks.py	`88.23% <0%> (-8.96%)`	⬇️
sklearn/linear_model/tests/test_bayes.py	`83.01% <0%> (-8.16%)`	⬇️
... and 262 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e29334...078bba7. Read the comment docs.

dalmia · 2017-02-19T11:03:15Z

sklearn/gaussian_process/tests/test_kernels.py

            _approx_fprime(kernel.theta, eval_kernel_for_theta, 1e-10)

-        assert_almost_equal(K_gradient, K_gradient_approx, 4)
+        assert_almost_equal(K_gradient, K_gradient_approx, 3)


Reduced the amount of precision. Is there an issue with that?

This should not be an issue.

dalmia · 2017-02-19T11:04:06Z

sklearn/gaussian_process/tests/test_kernels.py

-        for i, hyperparameter in enumerate(kernel.hyperparameters):
-            assert_equal(theta[i],
-                         np.log(getattr(kernel, hyperparameter.name)))
+        i = 0


The earlier implementation of the tests assume that:

len(kernel.hyperparameters) == kernel.theta.size

Thanks that you fixed this problem. I was suprised that it was not an issue so far because the anisotropic RBF/Matern also violate the condition. But this is because the test skips the non-basic kernels at the beginning. Could you check if you can now also remove the lines 74-76:

if isinstance(kernel, KernelOperator) \ or isinstance(kernel, Exponentiation): # skip non-basic kernels continue

Yes, that's what prompted me to check further as to how did RBF do through and then encountered that they are being ignored. Nevertheless, removing the lines causes a fail:

AssertionError: Items in the first set but not the second: 'k1__constant_value' 'k2__length_scale' """Fail immediately, with the given message.""" >> raise self.failureException("Items in the first set but not the second:\n'k1__constant_value'\n'k2__length_scale'")

I'll look into the issue and try to resolve it so that we can run the tests without ignoring any.

dalmia · 2017-02-19T11:05:08Z

sklearn/gaussian_process/tests/test_kernels.py

        assert_not_equal(id(kernel), id(kernel_cloned))

        # Check that all constructor parameters are equal.
-        assert_equal(kernel.get_params(), kernel_cloned.get_params())


Cannot compare the two dictionaries when the value of c is an array.

dalmia · 2017-02-22T06:54:35Z

Removing the lines can't be ignored:

if isinstance(kernel, KernelOperator) \
     or isinstance(kernel, Exponentiation):  # skip non-basic kernels
      continue

This is because, for example, KernelOperator adds k1__ and k2__ appended names of the parameter of the kernels to the list of hyperparameters:

for hyperparameter in self.k1.hyperparameters:
            r.append(Hyperparameter("k1__" + hyperparameter.name,
                                    hyperparameter.value_type,
                                    hyperparameter.bounds,
                                    hyperparameter.n_elements))

Whereas it just adds the original theta for each kernel to the theta for the operator:

return np.append(self.k1.theta, self.k2.theta)

changlan · 2017-03-02T00:08:55Z

sklearn/gaussian_process/kernels.py

    return length_scale


+def _check_offset(X, c):


This seems to be identical to _check_length_scale ? Essentially it is checking whether the kernel is anisotropic by checking length_scale or c in your case.

would make sense to unify the two functions. there is no essential difference besides the string in the second ValueError

changlan · 2017-03-02T00:10:47Z

sklearn/gaussian_process/kernels.py

+        self.c_bounds = c_bounds
+
+    @property
+    def non_uniform_offset(self):


Would it be better to name it as anisotropic as in RBF and Matern?

MechCoder · 2017-03-08T00:22:25Z

I agree that we should maybe add a hyperparameter offset to DotProduct Kernel instead of creating a new class, (with offset_scale_bounds set to "fixed" and offset set to zero for backward compatibility). Sorry for being late and leading you down the wrong road :/

MechCoder · 2017-03-08T00:40:36Z

sklearn/gaussian_process/kernels.py

+    sigma_0 and c. The parameters of the Linear kernel are about specifying the
+    origin. The kernel is given by:
+
+    k(x_i, x_j) = sigma_b ^ 2 + sigma_v ^ 2 * ((x_i - c) \cdot (x_j - c))


In any case, there is no need for sigma_b and sigma_v as hyperparameters. These can be achieved by doing something similar to ConstantKernel() +ConstantKernel() * LinearKernel()

MechCoder · 2017-03-08T00:56:31Z

So the gradient is with respect to log(hyperparameter). So I don't remember why exactly I was convinced that it is with respect to the hyperparameter in #6701? :/

jmetzen · 2017-03-08T06:29:42Z

The docstring was also misleading, sorry for that...

sklearn-lgtm · 2018-02-06T23:55:55Z

This pull request introduces 1 alert - view on lgtm.com

new alerts:

1 for eq not overridden when adding attributes

Comment posted by lgtm.com

agramfort · 2019-02-25T13:21:32Z

@dalmia @jmargeta @MechCoder do we want / need this or not?

jmargeta · 2019-03-25T12:14:19Z

Hi @agramfort , I do not have sufficient expertise with GPs for a relevant opinion. But from looking at the issue and the PR, it indeed seems that extending the DotProduct kernel is probably enough.

...and you probably meant to mention the other Jan (@jmetzen) anyway :)

thomasjpfan · 2022-08-04T16:47:17Z

I am closing this PR in favor of #8373, because that PR builds on top of this one and has more progress. Thank you for working on this PR.

dalmia added 2 commits February 16, 2017 20:09

FEAT: adding LinearKernel to GaussianProcesses

7399552

FIX: correct typo

6bdc3fe

FEAT: Add check for offset

3b734d6

dalmia mentioned this pull request Feb 18, 2017

Docstring of Gaussian processes kernel wrong about the calculation of gradient #8393

Closed

FEAT: adding gradient for parameter c

dd9607f

FEAT: completed support for parameter c

078bba7

dalmia changed the title ~~[WIP] Adding LinearKernel to GaussianProcesses~~ [MRG] Adding LinearKernel to GaussianProcesses Feb 19, 2017

dalmia commented Feb 19, 2017

View reviewed changes

changlan reviewed Mar 2, 2017

View reviewed changes

MechCoder reviewed Mar 8, 2017

View reviewed changes

MechCoder mentioned this pull request Mar 8, 2017

Derivative of the HammingKernel should be with respect to the log(length_scale) and not the length_scale scikit-optimize/scikit-optimize#330

Closed

jboes mentioned this pull request Feb 6, 2018

[MRG] Adding LinearKernel to GaussianProcesses #10597

Closed

3 tasks

jnothman added the Stalled label Feb 6, 2018

jnothman mentioned this pull request Jul 4, 2019

Gaussian Process Kernel Gradient #14206

Closed

github-actions bot added the module:gaussian_process label Mar 2, 2020

anhqngo mentioned this pull request Jul 1, 2020

Tutorial on how to create a custom kernel for Gaussian Process #17642

Open

rauwuckl mentioned this pull request Aug 8, 2020

[MRG] DOC clarify the kernel gradient for GaussianProcesses #18115

Merged

Base automatically changed from master to main January 22, 2021 10:49

thomasjpfan added the Superseded PR has been replace by a newer PR label Aug 4, 2022

thomasjpfan closed this Aug 4, 2022

Uh oh!

Conversation

dalmia commented Feb 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

codecov bot commented Feb 16, 2017

Codecov Report

Uh oh!

jmetzen commented Feb 17, 2017

Uh oh!

dalmia commented Feb 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalmia commented Feb 18, 2017

Uh oh!

jmetzen commented Feb 18, 2017

Uh oh!

dalmia commented Feb 18, 2017

Uh oh!

jmetzen commented Feb 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalmia commented Feb 18, 2017

Uh oh!

dalmia commented Feb 19, 2017

Uh oh!

codecov-io commented Feb 19, 2017

Codecov Report

Uh oh!

codecov-io commented Feb 19, 2017 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalmia commented Feb 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Mar 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Mar 8, 2017

Uh oh!

jmetzen commented Mar 8, 2017

Uh oh!

sklearn-lgtm commented Feb 6, 2018

Uh oh!

agramfort commented Feb 25, 2019

Uh oh!

jmargeta commented Mar 25, 2019

Uh oh!

thomasjpfan commented Aug 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

dalmia commented Feb 16, 2017 •

edited

Loading

dalmia commented Feb 17, 2017 •

edited

Loading

jmetzen commented Feb 18, 2017 •

edited

Loading

codecov-io commented Feb 19, 2017 •

edited by codecov bot

Loading