-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Creation of a 'Reading scikit-learn code' section in the docs? #12869
Copy link
Copy link
Closed
Description
Would you welcome a new subsection in (e.g.) the contributing guide, giving pointers on how to actually read and digest the existing code base? This could be a mix between general and sklearn-specific tips.
Lots of things may seem obvious for experienced programmers and contributors, but less experienced people might find the code base quite overwhelming at first. For example even simple utilities like scale() take more than 50 lines, while the bulk of the work could fit in a one-liner. It's easy to get lost in the details.
I was thinking of something like this (very roughly):
- It takes time and experience to efficiently read code. It's normal if it seems hard, because it is.
- Get acquainted with the estimator API:
fit,predict,fit_predict... - Identify the important parts and ignore the rest. In particular: a lot of the code (especially at the beginning of the
fit()methods) is just doing input checking. Focusing on this part isn't worth it if you only need to understand what an algorithm is doing. Make sure you can identify those parts from the actual ML algorithm. - Before trying to read a function / class, briefly read the parameters docstrings to have at least a vague idea of what each one of them is used for. Same for the attributes.
- Explain the
check_blahblah()functions, e.g.check_random_state()orcheck_cv: take whatever as input but return an object with predefined type. - We use cython to make things fast (.pyx and pxd files). Those files usually contain low-level routines that can probably be ignored during the first "reading sessions".
- ...
Any suggestion welcome.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels