Every estimator class should have a complete docstring.
This should be worked on one-by-one, and feel free to complete only individual rubrics if it's unclear what to fill in for the others.
A good estimator docstring should include rubrics:
- one-liner description (top), start capitalized, end with
.
- description paragraph - what is the algorithm?
Components block - only if there are estimator components. The list of components should be identical with constructor arguments that are estimators (inheriting from BaseClassifier, BaseForecaster, etc).
Parameters block - individual parameters listed with param_name: type, explanation, explanation should include value/structure convention if expectation is more specific than just stating the type, e.g., n: int, integer between 0 and 42.
The list of Parameters should be identical with constructor arguments that are not estimators.
Attributes block - these are the most important attributes of object instances which are not parameters or components. It should include attributes that correspond to the "fitted model".
Notes - details, formulae, academic references
Example - self-contained example on sktime internal toy data that runs
For formatting, we use the numpy style, though note that the rubrics are slightly different (because we are dealing with algorithms/estimators).
Also look at the extension templates for the algorithm scitype for a "fill-in template" that algorithm implementers are using (or should be using).
Here's an example of a good class docstring:
class BOSSEnsemble(BaseClassifier):
"""Ensemble of bag of Symbolic Fourier Approximation Symbols (BOSS).
Implementation of BOSS Ensemble from Schäfer (2015). [1]_
Overview: Input "n" series of length "m" and BOSS performs a grid search over
a set of parameter values, evaluating each with a LOOCV. It then retains
all ensemble members within 92% of the best by default for use in the ensmeble.
There are three primary parameters:
- alpha: alphabet size
- w: window length
- l: word length.
For any combination, a single BOSS slides a window length "w" along the
series. The w length window is shortened to an "l" length word through
taking a Fourier transform and keeping the first l/2 complex coefficients.
These "l" coefficients are then discretized into alpha possible values,
to form a word length "l". A histogram of words for each
series is formed and stored.
Fit involves finding "n" histograms.
Predict uses 1 nearest neighbor with a bespoke BOSS distance function.
Parameters
----------
threshold : float, default=0.92
Threshold used to determine which classifiers to retain. All classifiers
within percentage `threshold` of the best one are retained.
max_ensemble_size : int or None, default=500
Maximum number of classifiers to retain. Will limit number of retained
classifiers even if more than `max_ensemble_size` are within threshold.
max_win_len_prop : int or float, default=1
Maximum window length as a proportion of the series length.
min_window : int, default=10
Minimum window size.
n_jobs : int, default=1
The number of jobs to run in parallel for both `fit` and `predict`.
``-1`` means using all processors.
random_state : int or None, default=None
Seed for random, integer.
Attributes
----------
n_classes : int
Number of classes. Extracted from the data.
n_instances : int
Number of instances. Extracted from the data.
n_estimators : int
The final number of classifiers used. Will be <= `max_ensemble_size` if
`max_ensemble_size` has been specified.
series_length : int
Length of all series (assumed equal).
classifiers : list
List of DecisionTree classifiers.
See Also
--------
IndividualBOSS, ContractableBOSS
Notes
-------
For the Java version, see
`TSML <https://github.com/uea-machine-learning/tsml/blob/master/src/
main/java/tsml/classifiers/dictionary_based/BOSS.java>`_.
References
----------
.. [1] Patrick Schäfer, "The BOSS is concerned with time series classification
in the presence of noise", Data Mining and Knowledge Discovery, 29(6): 2015
https://link.springer.com/article/10.1007/s10618-014-0377-7
Example
-------
>>> from sktime.classification.dictionary_based import BOSSEnsemble
>>> from sktime.datasets import load_italy_power_demand
>>> X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
>>> X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
>>> clf = BOSSEnsemble()
>>> clf.fit(X_train, y_train)
BOSSEnsemble(...)
>>> y_pred = clf.predict(X_test)
"""
Every estimator class should have a complete docstring.
This should be worked on one-by-one, and feel free to complete only individual rubrics if it's unclear what to fill in for the others.
A good estimator docstring should include rubrics:
.Componentsblock - only if there are estimator components. The list of components should be identical with constructor arguments that are estimators (inheriting fromBaseClassifier,BaseForecaster, etc).Parametersblock - individual parameters listed withparam_name: type, explanation, explanation should include value/structure convention if expectation is more specific than just stating the type, e.g.,n: int, integer between 0 and 42.The list of
Parametersshould be identical with constructor arguments that are not estimators.Attributesblock - these are the most important attributes of object instances which are not parameters or components. It should include attributes that correspond to the "fitted model".Notes- details, formulae, academic referencesExample- self-contained example onsktimeinternal toy data that runsFor formatting, we use the numpy style, though note that the rubrics are slightly different (because we are dealing with algorithms/estimators).
Also look at the extension templates for the algorithm scitype for a "fill-in template" that algorithm implementers are using (or should be using).
Here's an example of a good class docstring: