-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Choice of words in documentation #13639
Description
Hi all,
I was browsing through the start of the the documentation, and didn't get far before noticing that quite a lot of synonyms are used in modules/linear_model.rst (html and source).
Example 1
In the introductory paragraph for the Lasso, three different words are used to describe the model coefficients. The below emphasis is mine.
The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero weights (see Compressive sensing: tomography reconstruction with L1 prior (Lasso)).
Example 2
The word features has a lot of synonyms. From the descriptions of Ridge, Lasso, and other related models, I found sentences such as:
- "...datasets with many collinear regressors..."
- "...linear combination of the input variables..."
- "...selected features are the same ..."
- "...it finds the predictor most correlated with the response..."
- "...when the number of dimensions is significantly greater than..."
Questions
I worry that the synonyms might make the documentation unnecessarily complicated.
- Do you agree with this? Should I clean it up in a PR?
If you agree, can you help me decide on preferred words? I get the impression that:
- Features is preferred over regressors, variables, predictors and dimensions.
- Samples is preferred over observations, points.
- Values stored in
coef_are denoted w in the documentation. Should we prefer coefficients or weights? The word parameter is also used, but I prefer to reserve that for regularization (hyper)parameters such asalphain Ridge.
I apologize if this has been discussed in another Issue/PR, but I didn't find any related discussions.