[MRG] _tree.pyx documentation additions#5010
Conversation
sklearn/tree/_tree.pyx
Outdated
|
General comment: we should agree on a common terminology regarding outputs. So far the code was mentioning "targets" or "outputs". Now you have added/replaced with "responses". I dont know what is best but we should try to limit ourselves -- it is otherwise more confusing than helpful. |
sklearn/tree/_tree.pyx
Outdated
There was a problem hiding this comment.
again, not sure if the added verbosity is so helpful
There was a problem hiding this comment.
+1, this code is quite straight forward. You might want to rename total to total_entropy if you want to make it fully explicit. No need for inline comments in this function.
|
Thanks for this PR! In general, I agree with your proposal, in particular at places where docstrings were missing. However, I am not totally happy with the inline comments you added. It is sometimes adding too much verbosity for nothing. In my opinion, inline comments should be used with parsimony, at places where the code may be difficult to understand by itself -- adding too many of them may have an overall negative impact. I dont know what is others opinion on this? |
|
I can tone down the inline comments, especially in places where they were not helpful. I think 'targets' is the right way to talk about y, so I can change that as well. |
I agree in general. When the "what" is not clear from the Python code, then generally renaming the local variable to have more explicit names is a better solution than adding an inline comment. When the "why" of a particular line is not clear, then an inline comment is the good solution. |
sklearn/tree/_tree.pyx
Outdated
There was a problem hiding this comment.
I think probability_mass_k is more explicit. The statements are short so we can afford a long variable name here.
There was a problem hiding this comment.
alternatively density_k or frequency_k are fine as well.
|
General remark for docstrings: it is recommend to follow PEP 257 to be nice with developer tools like IDE tooltips and sphinx API doc. In particular that means following the structure: def my_function(argument_1, argument_2):
"""One-line short summary of the function in less than 80 cols
Extended paragraph that give a detailed description of what the function
does. Note that the extended paragraph is separated from the one line
short summary by a blank line.
... (parameters doc and so on)
"""
# body of the function here |
sklearn/tree/_tree.pyx
Outdated
There was a problem hiding this comment.
I would use SIZE_t as type definition.
sklearn/tree/_tree.pyx
Outdated
|
Did my round of review, should be good to go once they are fixed along with Olivier's. |
sklearn/tree/_tree.pyx
Outdated
There was a problem hiding this comment.
If you want to add the fact that this is a place holder, I would say
"""Reset the criterion at pos=start
This methods must be implemented by the sub-class.
"""
|
Instead of saying that the method is a placeholder, I would find more informative to say that the method |
sklearn/tree/_tree.pyx
Outdated
There was a problem hiding this comment.
and also set the stride for that target-> and also compute the maximal stride of all targets
|
All suggestions should be incorporated. Thank you! |
|
Apparently this commit broke Gaussian processes! But only for Python 3.4 on Travis. I always knew that Gaussian processes were secretly opposed to documentation. |
|
Yes, this is due to #5045. Dont worry about it :) |
|
_gradient_boosting.c decided to tag along for the ride, so I accidentally pushed it too, but I went back and fixed the issue (I believe). All changes should be incorporated now. |
|
Merging this and making a few minor changes. Thanks for your patience! |
[MRG] _tree.pyx documentation additions
|
Thanks @jmschrei ! |
This is the documentation aspect of #5007 alone. No functional changes to the code are made, and it passes all nosetests. This documentation covers all criterion, and the dense splitters, and includes both docstrings and inline comments.