Description
As discovered in #8373, the docstring states that if eval_gradient is true, the gradient with respect to the kernel hyperparameters is returned, whereas actually it returns the gradient with respect to the log-transformed hyperparameters. As an example:
Writing the DotProduct kernel as a function of theta instead of sigma_0 with sigma_0 = exp(theta) gives us:
k(x_i, x_j) = sigma_0 ^ 2 + x_i \cdot x_j = exp(theta) ^ 2 + x_i \cdot x_j = exp(2*theta) + x_i \cdot x_j
Now the gradient with respect to theta is :
2*exp(2*theta) = 2*exp(theta)^2 = 2*\sigma_0^2