[MRG+2] Allowing Gaussian process kernels on structured data - updated#15557
Conversation
…een kernels that operate on explicit feature vectors or structured objects.
updated corresponding tests
…n variable-length sequences
…base `Kernel` class. Removed `VectorKernelMixin` which is redundant in presence of a default `requires_vector_input()` method.
|
Why do you regard this as WIP? What work remains other than review? |
I have one more commit that updates the changelog that has not yet been pushed : ) Should be there in a few minutes. |
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
|
@jnothman How could I restart a failed pipeline caused by network error? |
|
@jnothman Could you please let me know the next steps to do? |
|
Await another reviewer... unfortunately, this may miss the cutoff for 0.22. |
I see... Do I need to update the PR name to MRG + 1 etc.? |
TomDLT
left a comment
There was a problem hiding this comment.
Only nitpicks, thanks for this nice contribution !
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Thanks for reviewing! |
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
|
@adrinjalali are you keeping track of things that will either need their change log entries moved to 0.23 or will need to be picked into 0.22 final? (You don't necessarily need to keep track, just to be aware when selecting commits for release) |
|
@NicolasHug Hi Nicolas, the primary motivation that I brought up the PR was due to a need to perform Gaussian process regressions on an ensemble of graphs. While the PR intends to bridge scikit-learn's GPR module to data including and beyond variable-length sequences, the actual changes involves no more than allowing non-vectorial data to be passed through the GP regressors/classifiers --- without being touched at all. Thanks to the kernel trick, as discussed in the original issue, the logic of computing the kernel matrix from samples is delegated to a kernel, which will be user-supplied for sequence and generic data. As such, I would argue that this introduces a very minimal amount of burden to the development and maintenance of the GP module and does not disrupt the API and/or future development, while at the same time greatly extends the applicability of the module. |
Fixes #13936.
This PR is a from-scratch refresh of #13954 (with all previous modifications incorporated) based on the latest master branch.
Made the Gaussian process regressor and classifier compatible with generic kernels on arbitrary data. Specifically:
requires_vector_inputproperty to all the built-in GP kernels ingaussian_process/kernels.py.GenericKernelMixinbase classes to overload therequires_vector_inputproperty for kernels that can work on structured and/or generic data.check_arrayandcheck_X_yparameters for vector/generic input. I.e., do not force the X and Y array to be at least 2D and numeric if the kernel can operate on generic input. Updated docstrings accordingly.plot_gpr_on_structured_data.py.