[MRG] Add partial_fit function to DecisionTreeClassifier#18889
[MRG] Add partial_fit function to DecisionTreeClassifier#18889PSSF23 wants to merge 58 commits intoscikit-learn:mainfrom
Conversation
|
Thanks for the PR. Can you show that this is faster than building the tree from scratch? |
|
@amueller I don't think speed is what I have in mind. The VFDT name might cause some confusion, but this PR is more like a preliminary step that allows future algorithms to have a focus on streaming data. In those cases, data samples would come continuously and saving all of them to wait for a batch fitting would be quite expensive. I will check the time differences in benchmarks though. Thanks for the advice! |
|
If the goal is online learning, this should be implemented as partial_fit,
but you would need to show that multiple calls to partial_fit roughly
equate to fitting in batch.
|
|
@jnothman Thank you for the advice. So another I am working on benchmarking with |
Yes, |
This comment has been minimized.
This comment has been minimized.
The estimators in the tree module sets scikit-learn/sklearn/tree/_classes.py Lines 165 to 167 in f33fb0a |
PSSF23
left a comment
There was a problem hiding this comment.
The test error in test_different_endianness_joblib_pickle doesn't seem to be related:
ValueError: Big-endian buffer not supported on little-endian compiler
This is actually related although in a not so trivial way. One likely fix would be to change More details:
|
|
@lesteve Thanks! |
PSSF23
left a comment
There was a problem hiding this comment.
This line:
X, y = fetch_california_housing(return_X_y=True)causes the following error, which is definitely unrelated this time:
urllib.error.HTTPError: HTTP Error 403: Forbidden
|
This might be a temporary issue. We are planning to make a release with a retry mechanism at some point.
|
Reference Issues/PRs
First step for #18888
What does this implement/fix? Explain your changes.
partial_fitfunction toDecisionTreeClassifierAny other comments?
Collaboration of @neurodata
Thank you for feedback!