Creation of a 'Reading scikit-learn code' section in the docs?

Would you welcome a new subsection in (e.g.) the contributing guide, giving pointers on how to actually read and digest the existing code base? This could be a mix between general and sklearn-specific tips.

Lots of things may seem obvious for experienced programmers and contributors, but less experienced people might find the code base quite overwhelming at first. For example even simple utilities like `scale()` take more than 50 lines, while the bulk of the work could fit in a one-liner. It's easy to get lost in the details.

I was thinking of something like this (very roughly):

- It takes time and experience to efficiently read code. It's normal if it seems hard, because it is.
- Get acquainted with the estimator API: `fit`, `predict`, `fit_predict`...
- Identify the important parts and ignore the rest. In particular: **a lot** of the code (especially at the beginning of the `fit()` methods) is just doing input checking. Focusing on this part isn't worth it if you only need to understand what an algorithm is doing. Make sure you can identify those parts from the actual ML algorithm.
- Before trying to read a function / class, briefly read the parameters docstrings to have at least a vague idea of what each one of them is used for. Same for the attributes.
- Explain the `check_blahblah()` functions, e.g. `check_random_state()` or `check_cv`: take whatever as input but return an object with predefined type.
- We use cython to make things fast (.pyx and pxd files). Those files usually contain low-level routines that can probably be ignored during the first "reading sessions".
- ...

Any suggestion welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creation of a 'Reading scikit-learn code' section in the docs? #12869

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Creation of a 'Reading scikit-learn code' section in the docs? #12869

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions