-
-
Notifications
You must be signed in to change notification settings - Fork 212
Extension interface #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension interface #647
Conversation
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! First impression is very good :) after thinking it over a bit more I might find something to bicker about but I think it's fine to go ahead with these changes. In the process of working things out/implementing extensions for other modules we might find short-comings, but I say we can tackle them as they come.
|
I really like where this is going. Additional bonus: introduction of type hints. I marked down my first comments. Feel free to ping me for a next review. |
Codecov Report
@@ Coverage Diff @@
## develop #647 +/- ##
===========================================
+ Coverage 90.94% 90.99% +0.05%
===========================================
Files 32 36 +4
Lines 3391 3522 +131
===========================================
+ Hits 3084 3205 +121
- Misses 307 317 +10
Continue to review full report at Codecov.
|
|
Okay, I checked the coverage, and the coverage diff is mostly in the dataset submodule and thereby nothing which should be addressed by this pull request. @PGijsbers @joaquinvanschoren @janvanrijn @Bilgecelik this pull request is ready for review. |
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! No major remarks other than the multiple inheritance suggestion (which should be easy nonetheless) 👍
| import sklearn.base | ||
|
|
||
| from openml.extensions import Extension, register_extension | ||
| from openml.extensions.sklearn.flow_functions import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it might make sense to use double inheritance instead? E.g. basically make all functions in run_functions.py part of a SklearnRunExtension-class and all functions in flow_functions part of a SklearnFlowExtension-class and then simply use inheritance instead (both of these would inherit from Extension). Functions which are implemented in neither can be implemented here (currently only can_handle_model). So the rewritten extension would look like:
class SklearnExtension(SklearnRunExtension, SklearnFlowExtension):
@staticmethod
def can_handle_model(model: Any) -> bool:
"""Check whether a model is an instance of ``sklearn.base.BaseEstimator``.
Parameters
----------
model : Any
Returns
-------
bool
"""
return isinstance(model, sklearn.base.BaseEstimator)
?
I don't have much experience with multiple inheritance but I think it would work (I tried a toy example and that worked). It would help by:
- Removing a lot of double documentation, since now you document the function twice, e.g. here and here. This can lead to inconsistencies (indeed, even the linked example has inconsistencies, though only a minor ones regarding spacing.
- making it much clearer which functions are actually implemented in this file.
- maintenance when e.g. function signatures change. Currently we should change the
Extensioninterface, the method implementation in e.g.flow_functions.pyand the redirect inextension.py. Using multiple inheritance saves us one step (since the method is not redeclared inextension.py)
janvanrijn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great effort @mfeurer, this looks really great!
I left a lot of comments (learned from the best ;) ) but I think most of them can be resolved easily. The ones that are a bit bigger are
- implement functions for extension interface in the appropriate class itself
- try to simplify
run_model_on_foldinterface.
As the PR is getting already quite big, I agree with moving these to a next PR, as long as we can all agree to resolve these before releasing a new version. (as it might impose quite some backward compatibility problems)
Additionally, the unit tests can probably be modularized better (moving extension dependent tests elsewhere) but also this does not seem like a priority to me.
All together this PR looks very good and with some effort it can be merged as soon as possible :)
tests/test_extensions/test_sklearn_extension/test_sklearn_flow_functions.py
Outdated
Show resolved
Hide resolved
tests/test_extensions/test_sklearn_extension/test_sklearn_run_functions.py
Outdated
Show resolved
Hide resolved
|
Okay, I have addressed all issues raised so for, please go ahead for a (hopefully) final review. Regarding failing CI:
|
|
I would accept but I don't feel this issue is resolved yet. Would like some clarification. |
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the open issue, as said, it's good enough for me if it's good enough for Jan. But I don't want this to be of a negative note, because the changes in this PR are great 🎉 thanks for all the effort!
janvanrijn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another look through this PR and personally have a feeling we can proceed and merge. If anything, this PR will make the library better maintainable, more stable and flexible.
One more thing that I want to check is the list of open TODO's after this PR, as I mentioned some in my previous pass. It would be good to make sure that this list is complete and that we are all on the same page regarding not releasing before we have solved all of them (or argued why we won't / can't solve them)
I would also add dependency structure to this list (my last open comment)
|
Here's the list of TODOs:
|
1 similar comment
|
Here's the list of TODOs:
|
* draft extensions interface * Change to new advised style of defining abstract base class. * incorporate @pgijbers' feedback * incorporate Jan's comments * (hopefully) make the tests run again * make more tests work again * fix more tests? * Move all files for the sklearn converter to a single location * fix tests * TST fix function call * slight reorganization of the files * TST fix wrong path * TST fix wrong path * MAINT add type hints to all methods touched in this PR * factor a lot of extension functions to new file * fix a few broken tests * rename test files to reflect previous refactor * fix unit tests * fix unit tests * add extension plugin mechanism * pep8 & mypy * save docstring progress * fix? * finish docstrings & simplify interface * add extension interface to documentation * PEP8 & doc building * Address comments by Jan and Pieter * progress dump * tests, pep8, shuffle functions and tests around
This PR implements an extension interface to OpenML-Python which abstracts away the deep connection with scikit-learn and allows using OpenML-Python with other machine learning libraries as well.
The main interface can be found in
openml.extensions.extension_interface. A possible use for scikit-learn is given inopenml.extensions.sklearn_extension.The whole PR is very raw and is only intended to provide feedback to me. Missing parts are:
This supersedes #543
@janvanrijn @joaquinvanschoren @amueller @ArlindKadra @PGijsbers @Bilgecelik