[WIP/RFC] Test docstring parameters (with order)#9023
[WIP/RFC] Test docstring parameters (with order)#9023agramfort wants to merge 24 commits intoscikit-learn:masterfrom
Conversation
|
See also #7793. I think there's no doubt that something like this could improve documentation quality. One big challenge is handling exceptional cases, including deprecation and *args ( I had grand plans to make something like this more descriptive, outputting a diff between the parameters and the docstring, so that it effectively fixes things for you. I have some code somewhere. Let me know if I should pull it out and share it. (Note that diffing docstrings is only feasible in some cases, e.g. where it appears directly in the function in question and But what I'd really like to see a test for is that fitted model attributes correspond to those in the docstring. |
+1 but maybe for another PR once this or #7793 is reviewed and merged. |
|
@ogrisel do you think it's reasonable to fix all docstrings in this PR?
otherwise we need to do the hack of flake8
|
| raise SkipTest( | ||
| "numpydoc is required to test the docstrings") | ||
|
|
||
| from numpydoc import docscrape |
There was a problem hiding this comment.
The codecov browser extension tells me that this line is never executed by our CI servers. It seems that we need to add the numpydoc module to at least a Python 3.6 and probably to a Python 2.7 build job in travis to properly cover this test code (and actually run the tests).
I think we can merge a first PR with the infrastructure to do the checks as long as it's sufficiently tested. Speaking of which, it would be great to have couple of unittests to check the
As @jnothman said there are probably functions that are exception to the behavioral rules encoded in your test function (e.g. |
|
numpydoc? anyone wanna review #7355 ;) [currently it coughs up lots of errors but should still work] |
| The probability that a coefficient is zero (see notes). Larger values | ||
| enforce more sparsity. | ||
|
|
||
| norm_diag : boolean, optional (default=False) |
There was a problem hiding this comment.
I guess we need to respect the order exactly?
| '__neg__', '__hash__') | ||
|
|
||
| public_modules = [ | ||
| # the list of modules users need to access for all functionality |
There was a problem hiding this comment.
why not all? too much to do? leave for future PRS?
There was a problem hiding this comment.
you want a huge PR to review? :)
There was a problem hiding this comment.
sklearn.base should be there, no? Why not use pkgutils?
| ] | ||
|
|
||
|
|
||
| def check_parameters_match(func, doc=None): |
There was a problem hiding this comment.
This maybe should move to utils.testing? If it deserves its own tests, I don't think it should live in a test file.
There was a problem hiding this comment.
This file currently contains three things from what I can see: a function to check docstrings, tests for this function, and tests that run this function on sklearn. I feel they should live in two or three different files since they are conceptually very separate.
|
|
||
| def test_check_parameters_match(): | ||
| check_parameters_match(f_ok) | ||
| assert_raise_message(RuntimeError, 'Unknown section Results', |
There was a problem hiding this comment.
Nitpick: it would be better to have: 'Unknown section: Results'
There was a problem hiding this comment.
Or even:
'Invalid section: Results'There was a problem hiding this comment.
this is a numpydoc string not mine
|
refactoring done |
| # versions of numpy, scipy with ATLAS that comes with Ubuntu Trusty 14.04 | ||
| - env: DISTRIB="ubuntu" PYTHON_VERSION="2.7" CYTHON_VERSION="0.23.4" | ||
| COVERAGE=true | ||
| COVERAGE=true TEST_DOCSTRINGS="false" |
There was a problem hiding this comment.
No need to set TEST_DOCSTRINGS if you don't want to test the docstrings (similar to what we do with COVERAGE)
Update with master and fix merge conflicts
|
Closing this in favor of #9206 |
Reference Issue
Fixes #7758
What does this implement/fix? Explain your changes.
This adds a unit test to check that all parameters are documented
and in the right order.
Any other comments?
For now I only fixes datasets module and I think it's the least controversial.