Hello,
I am looking at the gradient computation in Gaussian Process Kernel module. My understanding is that there we are trying to compute $$\partial{K(x,x')}{\partial \theta}$$, where $$\theta$$ is a hyperparameter. However, I am not sure that is what is computed in the code:
For example, the ConstantKernel has:
While I think instead of filling in the constant value, we should fill in just 1.
Another example is the RBF kernel,
|
elif not self.anisotropic or length_scale.shape[0] == 1: |
I think the gradient should be further divided by the
length_scale.
It seems to me that rather than computing the gradient, we are computing gradient * hyperparameter here. Am I missing something?
Thanks!
Junteng
Hello,
I am looking at the gradient computation in Gaussian Process Kernel module. My understanding is that there we are trying to compute$$\partial{K(x,x')}{\partial \theta}$$ , where $$\theta$$ is a hyperparameter. However, I am not sure that is what is computed in the code:
For example, the ConstantKernel has:
scikit-learn/sklearn/gaussian_process/kernels.py
Line 1013 in f339609
While I think instead of filling in the constant value, we should fill in just 1.
Another example is the RBF kernel,
scikit-learn/sklearn/gaussian_process/kernels.py
Line 1232 in f339609
I think the gradient should be further divided by the length_scale.
It seems to me that rather than computing the gradient, we are computing gradient * hyperparameter here. Am I missing something?
Thanks!
Junteng