[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV#6351
[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV#6351yenchenlin wants to merge 2 commits intoscikit-learn:masterfrom
Conversation
75e7401 to
04b9e79
Compare
| """ | ||
| return super(StratifiedKFold, self).split(X, y, labels) | ||
|
|
||
| class HomogeneousTimeSeriesCV(_BaseKFold): |
There was a problem hiding this comment.
It is convenient to make HomogeneousTimeSeriesCV subclass _BaseKFold since __init__ in _BaseKFold can help HomogeneousTimeSeriesCV to check whether input parameter n_folds is valid.
However, in order to make HomogeneousTimeSeriesCV work, HomogeneousTimeSeriesCV needs to override function split which is defined in its superclass _BaseKFold and its super-superclass BaseCrossValidator due to their implementation detail of split.
I don't think override split in HomogeneousTimeSeriesCV is a good solution since other subclasses of _BaseKFold didn't override split but override _iter_test_indices and _iter_test_masks instead, can @rvraghav93 provide some suggestions on this?
73f5661 to
62bafb4
Compare
62bafb4 to
78c3dcd
Compare
|
Please split the PR into two for now. Will be easier to review. |
|
|
||
| Notes | ||
| ----- | ||
| The first ``n_samples % n_folds`` folds have size |
There was a problem hiding this comment.
This note is confusing, to say the least.
You should mention that for the first n_samples % n_folds, the number of samples in each fold are "incremented" by n_samples // n_folds + 1.
There was a problem hiding this comment.
Sorry maybe I'm too dumb.
the number of samples in each fold are "incremented" by n_samples // n_folds + 1
Why is "incremented" used here?
I think the number of samples in the first n_samples % n_folds folds is exactly n_samples // n_folds + 1,
which is "incremented" by 1 compared to other folds?
There was a problem hiding this comment.
Oh, this is related to the prev comment. Sorry I meant in "each split the number of samples are incremented by"
You can move this to the n_folds documentation to avoid such confusion
|
Just gave a first pass. I also agree that this would be highly useful!! |
|
@MechCoder Thanks! |
|
Close this PR since I plan to separate this into two PRs:
|
This PR is an implementation of time series cross validation.
Based on #6322 , there are basically two cases:
For brevity,
I'll use HomoTSCV to represent homogeneous time series and HeteroTSCV to represent heterogeneous time series in the following checklist.
Check List:
Reference:
Using k-fold cross-validation for time-series model selection