Skip to content

[MRG] _tree.pyx documentation additions#5010

Merged
glouppe merged 8 commits intoscikit-learn:masterfrom
jmschrei:_tree_doc
Jul 31, 2015
Merged

[MRG] _tree.pyx documentation additions#5010
glouppe merged 8 commits intoscikit-learn:masterfrom
jmschrei:_tree_doc

Conversation

@jmschrei
Copy link
Copy Markdown
Member

This is the documentation aspect of #5007 alone. No functional changes to the code are made, and it passes all nosetests. This documentation covers all criterion, and the dense splitters, and includes both docstrings and inline comments.

@amueller
Copy link
Copy Markdown
Member

This looks great! ping @glouppe @arjoly

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classifier -> criterion

@glouppe
Copy link
Copy Markdown
Contributor

glouppe commented Jul 21, 2015

General comment: we should agree on a common terminology regarding outputs. So far the code was mentioning "targets" or "outputs". Now you have added/replaced with "responses". I dont know what is best but we should try to limit ourselves -- it is otherwise more confusing than helpful.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not sure if the added verbosity is so helpful

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this code is quite straight forward. You might want to rename total to total_entropy if you want to make it fully explicit. No need for inline comments in this function.

@glouppe
Copy link
Copy Markdown
Contributor

glouppe commented Jul 21, 2015

Thanks for this PR! In general, I agree with your proposal, in particular at places where docstrings were missing. However, I am not totally happy with the inline comments you added. It is sometimes adding too much verbosity for nothing. In my opinion, inline comments should be used with parsimony, at places where the code may be difficult to understand by itself -- adding too many of them may have an overall negative impact. I dont know what is others opinion on this?

@jmschrei
Copy link
Copy Markdown
Member Author

I can tone down the inline comments, especially in places where they were not helpful. I think 'targets' is the right way to talk about y, so I can change that as well.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Jul 21, 2015

In my opinion, inline comments should be used with parsimony, at places where the code may be difficult to understand by itself -- adding too many of them may have an overall negative impact. I dont know what is others opinion on this?

I agree in general. When the "what" is not clear from the Python code, then generally renaming the local variable to have more explicit names is a better solution than adding an inline comment.

When the "why" of a particular line is not clear, then an inline comment is the good solution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probability_mass_k is more explicit. The statements are short so we can afford a long variable name here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively density_k or frequency_k are fine as well.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Jul 21, 2015

General remark for docstrings: it is recommend to follow PEP 257 to be nice with developer tools like IDE tooltips and sphinx API doc. In particular that means following the structure:

def my_function(argument_1, argument_2):
    """One-line short summary of the function in less than 80 cols

    Extended paragraph that give a detailed description of what the function
    does. Note that the extended paragraph is separated from the one line
    short summary by a blank line.

    ... (parameters doc and so on)
    """
    # body of the function here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use SIZE_t as type definition.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

@glouppe
Copy link
Copy Markdown
Contributor

glouppe commented Jul 24, 2015

Did my round of review, should be good to go once they are fixed along with Olivier's.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to add the fact that this is a place holder, I would say


"""Reset the criterion at pos=start

This methods must be implemented by the sub-class.
"""

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Jul 24, 2015

Instead of saying that the method is a placeholder, I would find more informative to say that the method
must be implemented by the sub-class.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also set the stride for that target-> and also compute the maximal stride of all targets

@glouppe glouppe mentioned this pull request Jul 28, 2015
30 tasks
@jmschrei
Copy link
Copy Markdown
Member Author

All suggestions should be incorporated. Thank you!

@jmschrei
Copy link
Copy Markdown
Member Author

Apparently this commit broke Gaussian processes! But only for Python 3.4 on Travis. I always knew that Gaussian processes were secretly opposed to documentation.

@glouppe
Copy link
Copy Markdown
Contributor

glouppe commented Jul 29, 2015

Yes, this is due to #5045. Dont worry about it :)

@jmschrei
Copy link
Copy Markdown
Member Author

_gradient_boosting.c decided to tag along for the ride, so I accidentally pushed it too, but I went back and fixed the issue (I believe). All changes should be incorporated now.

@glouppe
Copy link
Copy Markdown
Contributor

glouppe commented Jul 31, 2015

Merging this and making a few minor changes. Thanks for your patience!

glouppe added a commit that referenced this pull request Jul 31, 2015
[MRG] _tree.pyx documentation additions
@glouppe glouppe merged commit d857349 into scikit-learn:master Jul 31, 2015
@arjoly
Copy link
Copy Markdown
Member

arjoly commented Jul 31, 2015

Thanks @jmschrei !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants