Skip to content

[MRG] Adding LinearKernel to GaussianProcesses#8373

Closed
dalmia wants to merge 5 commits intoscikit-learn:mainfrom
dalmia:8359
Closed

[MRG] Adding LinearKernel to GaussianProcesses#8373
dalmia wants to merge 5 commits intoscikit-learn:mainfrom
dalmia:8359

Conversation

@dalmia
Copy link
Copy Markdown
Contributor

@dalmia dalmia commented Feb 16, 2017

Reference Issue

Fixes #8359

What does this implement/fix? Explain your changes.

This adds LinearKernel to GaussianProcesses as per the link mentioned in the issue thread.

Any other comments?

  • Implementation of sigma_v and sigma_b
  • Add diag(X)
  • Add documentation
  • Implementation of c

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 16, 2017

Codecov Report

Merging #8373 into master will increase coverage by <.01%.
The diff coverage is 97.05%.

@@            Coverage Diff             @@
##           master    #8373      +/-   ##
==========================================
+ Coverage   94.75%   94.75%   +<.01%     
==========================================
  Files         342      342              
  Lines       60801    60835      +34     
==========================================
+ Hits        57609    57642      +33     
- Misses       3192     3193       +1
Impacted Files Coverage Δ
sklearn/gaussian_process/tests/test_kernels.py 98.83% <ø> (ø)
sklearn/gaussian_process/kernels.py 91.86% <97.05%> (+0.32%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c1bf1...6bdc3fe. Read the comment docs.

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Feb 17, 2017

I am not totally convinced that we should add the functionality as a new kernel (LinearKernel) rather than extending the existing DotProduct kernel. Or, alternatively, change the implementation of DotProduct such that it is a specialization of the new LinearKernel. We should try to avoid code duplication here.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 17, 2017

I agree. For the most part, this seems an extension to the DotProduct kernel where we can add sigma_v and c, defaulting them to 1 and 0 respectively.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 18, 2017

@jmetzen Could you please explain as to why the gradient of K(X, X') with respect to sigma_0 in DotProduct kernel is not 2 * sigma_0 but 2 * sigma_0 ** 2?

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Feb 18, 2017

This is a good question. It looks like a bug. But the unittest in test_kernels.py, which compares the implemented gradient with its numerical approximation does not fail. When I change the gradient to 2 * sigma_0 it does fail, however. A similar situation exists for the ConstantKernel. I don't remember why this should be the case.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 18, 2017

Yes, I observed the same thing when I was working on the eval_gradient part of LinearKernel. Maybe @MechCoder can help us here? :)

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Feb 18, 2017

Ah, I remember. It is because theta is actually always the logarithm of the hyperparameter values. And the gradient of the kernel is taken with respect to theta and not with respect to the hyperparameter value like sigma_0. Let us consider the DotProduct kernel as an example. Writing as a function of theta instead of sigma_0 with sigma_0 = exp(theta) gives us:
k(x_i, x_j) = sigma_0 ^ 2 + x_i \cdot x_j = exp(theta) ^ 2 + x_i \cdot x_j = exp(2*theta) + x_i \cdot x_j
Now the gradient with respect to theta is :
2*exp(2*theta) = 2*exp(theta)^2 = 2*\sigma_0^2
So the implemented gradient is correct (same for the ConstantKernel). However, the docstring is wrong when it says "The gradient of the kernel k(X, X) with respect to the hyperparameter of the kernel". It should be "The gradient of the kernel k(X, X) with respect to the log-transformed hyperparameter of the kernel" or something like that.

As an addendum: Why is theta the log-transformed value? From the docstring of the theta property:
"Note that theta are typically the log-transformed values of the
kernel's hyperparameters as this representation of the search space
is more amenable for hyperparameter search, as hyperparameters like
length-scales naturally live on a log-scale."

We should create a separate issue in which we can make the documentation and the comments more explicit about this.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 18, 2017

Thank you so much for the clear explanation. Finally, this makes sense now and I can proceed with adding the c parameter to it. But you are also right in that, reading the docstring this doesn't seem very obvious and needs to be documented better.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 19, 2017

Just reporting, there is a bug in the tests. It assumes that:

len(kernel.hyperparameters) == kernel.theta.size

For the LinearKernel case when c.ndim == X.ndim, suppose c.ndim = 2. Now:

>>>len(kernel.hyperparameters)
3
>>>kernel.theta.size
4

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #8373 into master will increase coverage by <.01%.
The diff coverage is 97.05%.

@@            Coverage Diff             @@
##           master    #8373      +/-   ##
==========================================
+ Coverage   94.75%   94.75%   +<.01%     
==========================================
  Files         342      342              
  Lines       60801    60835      +34     
==========================================
+ Hits        57609    57642      +33     
- Misses       3192     3193       +1
Impacted Files Coverage Δ
sklearn/gaussian_process/tests/test_kernels.py 98.83% <ø> (ø)
sklearn/gaussian_process/kernels.py 91.86% <97.05%> (+0.32%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c1bf1...078bba7. Read the comment docs.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Feb 19, 2017

Codecov Report

Merging #8373 into master will decrease coverage by 1.44%.
The diff coverage is 96.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8373      +/-   ##
==========================================
- Coverage   96.19%   94.75%   -1.45%     
==========================================
  Files         348      342       -6     
  Lines       64645    60895    -3750     
==========================================
- Hits        62187    57700    -4487     
- Misses       2458     3195     +737
Impacted Files Coverage Δ
sklearn/gaussian_process/tests/test_kernels.py 98.94% <100%> (-1.06%) ⬇️
sklearn/gaussian_process/kernels.py 91.94% <95.31%> (+0.39%) ⬆️
sklearn/feature_extraction/tests/test_image.py 4.78% <0%> (-95.22%) ⬇️
sklearn/feature_extraction/image.py 58.16% <0%> (-41.84%) ⬇️
sklearn/utils/arpack.py 42.5% <0%> (-32.5%) ⬇️
sklearn/utils/random.py 59.29% <0%> (-32.2%) ⬇️
sklearn/datasets/tests/test_kddcup99.py 29.16% <0%> (-10.84%) ⬇️
sklearn/manifold/mds.py 84.46% <0%> (-9.71%) ⬇️
sklearn/utils/tests/test_estimator_checks.py 88.23% <0%> (-8.96%) ⬇️
sklearn/linear_model/tests/test_bayes.py 83.01% <0%> (-8.16%) ⬇️
... and 262 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e29334...078bba7. Read the comment docs.

@dalmia dalmia changed the title [WIP] Adding LinearKernel to GaussianProcesses [MRG] Adding LinearKernel to GaussianProcesses Feb 19, 2017
_approx_fprime(kernel.theta, eval_kernel_for_theta, 1e-10)

assert_almost_equal(K_gradient, K_gradient_approx, 4)
assert_almost_equal(K_gradient, K_gradient_approx, 3)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced the amount of precision. Is there an issue with that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be an issue.

for i, hyperparameter in enumerate(kernel.hyperparameters):
assert_equal(theta[i],
np.log(getattr(kernel, hyperparameter.name)))
i = 0
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier implementation of the tests assume that:

len(kernel.hyperparameters) == kernel.theta.size

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks that you fixed this problem. I was suprised that it was not an issue so far because the anisotropic RBF/Matern also violate the condition. But this is because the test skips the non-basic kernels at the beginning. Could you check if you can now also remove the lines 74-76:

if isinstance(kernel, KernelOperator) \
     or isinstance(kernel, Exponentiation):  # skip non-basic kernels
      continue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what prompted me to check further as to how did RBF do through and then encountered that they are being ignored. Nevertheless, removing the lines causes a fail:

AssertionError: Items in the first set but not the second:
'k1__constant_value'
'k2__length_scale'
    """Fail immediately, with the given message."""
>>  raise self.failureException("Items in the first set but not the second:\n'k1__constant_value'\n'k2__length_scale'")

I'll look into the issue and try to resolve it so that we can run the tests without ignoring any.

assert_not_equal(id(kernel), id(kernel_cloned))

# Check that all constructor parameters are equal.
assert_equal(kernel.get_params(), kernel_cloned.get_params())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot compare the two dictionaries when the value of c is an array.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 22, 2017

Removing the lines can't be ignored:

if isinstance(kernel, KernelOperator) \
     or isinstance(kernel, Exponentiation):  # skip non-basic kernels
      continue

This is because, for example, KernelOperator adds k1__ and k2__ appended names of the parameter of the kernels to the list of hyperparameters:

for hyperparameter in self.k1.hyperparameters:
            r.append(Hyperparameter("k1__" + hyperparameter.name,
                                    hyperparameter.value_type,
                                    hyperparameter.bounds,
                                    hyperparameter.n_elements))

Whereas it just adds the original theta for each kernel to the theta for the operator:

return np.append(self.k1.theta, self.k2.theta)

return length_scale


def _check_offset(X, c):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be identical to _check_length_scale ? Essentially it is checking whether the kernel is anisotropic by checking length_scale or c in your case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make sense to unify the two functions. there is no essential difference besides the string in the second ValueError

self.c_bounds = c_bounds

@property
def non_uniform_offset(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to name it as anisotropic as in RBF and Matern?

@MechCoder
Copy link
Copy Markdown
Member

I agree that we should maybe add a hyperparameter offset to DotProduct Kernel instead of creating a new class, (with offset_scale_bounds set to "fixed" and offset set to zero for backward compatibility). Sorry for being late and leading you down the wrong road :/

sigma_0 and c. The parameters of the Linear kernel are about specifying the
origin. The kernel is given by:

k(x_i, x_j) = sigma_b ^ 2 + sigma_v ^ 2 * ((x_i - c) \cdot (x_j - c))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, there is no need for sigma_b and sigma_v as hyperparameters. These can be achieved by doing something similar to ConstantKernel() +ConstantKernel() * LinearKernel()

@MechCoder
Copy link
Copy Markdown
Member

So the gradient is with respect to log(hyperparameter). So I don't remember why exactly I was convinced that it is with respect to the hyperparameter in #6701? :/

@jmetzen
Copy link
Copy Markdown
Member

jmetzen commented Mar 8, 2017

The docstring was also misleading, sorry for that...

@sklearn-lgtm
Copy link
Copy Markdown

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for eq not overridden when adding attributes

Comment posted by lgtm.com

@agramfort
Copy link
Copy Markdown
Member

@dalmia @jmargeta @MechCoder do we want / need this or not?

@jmargeta
Copy link
Copy Markdown
Contributor

Hi @agramfort , I do not have sufficient expertise with GPs for a relevant opinion. But from looking at the issue and the PR, it indeed seems that extending the DotProduct kernel is probably enough.

...and you probably meant to mention the other Jan (@jmetzen) anyway :)

@thomasjpfan
Copy link
Copy Markdown
Member

I am closing this PR in favor of #8373, because that PR builds on top of this one and has more progress. Thank you for working on this PR.

@thomasjpfan thomasjpfan closed this Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:gaussian_process Stalled Superseded PR has been replace by a newer PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linear Kernel for Gaussian Processes

10 participants