[MRG+1] Fix #9743: Adding parameter information to docstring.#9757
[MRG+1] Fix #9743: Adding parameter information to docstring.#9757jnothman merged 4 commits intoscikit-learn:masterfrom
Conversation
|
Thanks. You should use the formal Parameters section like everywhere else in the codebase. |
|
@jnothman Ah, I see. I think I now have it formatted properly. |
| Splits the data into training/test set folds according to a predefined | ||
| scheme. Each sample can be assigned to at most one test set fold, as | ||
| specified by the user through the ``test_fold`` parameter. | ||
| Provides train/test indices to split data into train/test sets using a |
There was a problem hiding this comment.
I think "provides" is a poor word choice here given that in this case the user provides the indices
There was a problem hiding this comment.
I think it's a bit tricky. The user provides via test_fold what I want to call a "test-fold assignment" or "test-fold membership." This implicitly defines a collection of arrays which represent the indices of the data points split into each (train-fold, test-fold) pair. (There are as many folds as there are unique values (different from -1) in test_fold.) The explicit representation of the train/test folds in the form of the collection of arrays is being "provided" by the function.
A quick way fix without introducing this colloquial term I've used above to (hopefully) help distinguish the implicit vs. explicit ways of representing a collection of test/train folds is to use "computes" insted of "provides"? Or do nothing? Another possibility is making the example in the documentation a bit larger to possibly clear up confusion, but it may be overkill. I'm open to suggestions as my only rationale to previously use "provides" is the docstring for KFold.
|
okay. I can accept it as is
…On 15 Sep 2017 12:25 am, "Kye Taylor" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/model_selection/_split.py
<#9757 (comment)>
:
> @@ -1706,12 +1706,19 @@ def _validate_shuffle_split(n_samples, test_size, train_size):
class PredefinedSplit(BaseCrossValidator):
"""Predefined split cross-validator
- Splits the data into training/test set folds according to a predefined
- scheme. Each sample can be assigned to at most one test set fold, as
- specified by the user through the ``test_fold`` parameter.
+ Provides train/test indices to split data into train/test sets using a
It's subtle, but the user doesn't provide the indices that make up each
test/train fold. Rather, the user provides what I want to call a "fold
assignment" that is converted to said indices. So, instead of "provides" I
could put "converts" or "computes"... I'm open to suggestions as I
previously decided to use "provides" because it models the docstring for
KFold.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9757 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6y3kwKW7PbvyY-zHqbOjDOQXVNFEks5siTdlgaJpZM4PWz9->
.
|
|
I don't think this needs a second LGTM |
|
Thanks @taylorkm |
|
Cool! Thanks @jnothman |
…ing. (scikit-learn#9757) * Adding parameter information to docstring. * Removing trailing whitespace from lines. * Adding details of parameter to formal Parameters section. * Shortened lines to meet requirements.
* tag '0.19.1': (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
* releases: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
* dfsg: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
…ing. (scikit-learn#9757) * Adding parameter information to docstring. * Removing trailing whitespace from lines. * Adding details of parameter to formal Parameters section. * Shortened lines to meet requirements.
…ing. (scikit-learn#9757) * Adding parameter information to docstring. * Removing trailing whitespace from lines. * Adding details of parameter to formal Parameters section. * Shortened lines to meet requirements.
Reference Issue
Example: Fixes #9743
What does this implement/fix? Explain your changes.
Using docstring for
sklearn.model_selection.KFoldas a model, I added a description of how thetest_foldparameter can be used.Any other comments?
There are no checks to see if
test_foldparameter is valid, and it is possible that it has more/less entries than the number of samples.