Fix errors reported by docstub (II).#8011
Conversation
|
I'd like to get inputs on how I should fix this invalid syntax: The 'lazy' way would be: But maybe the more consistent way (with the rest of the (knowing that this syntax is not valid docstub-wise). Any preferences? |
e218083 to
2b94664
Compare
|
@mkcor, I just noticed that our typing workflow is still using an old docstub version 0.3! See scikit-image/.github/workflows/typing.yml Line 52 in 9d8daba That explains your confusion and perhaps the discrepancies your are seeing in #8011 (comment). After the discussion during our summit I removed the skimage specific array syntax (scientific-python/docstub@9d546a8). We opted to adopt the less ambiguous one from scikit-learn. So we need to update docstub to the newest version. Do you want to do that in this PR? Otherwise I'm happy to so quickly in another one (tomorrow if I don't get a response). |
|
Wait, I'll do the PR right now. Bumping the version separately is useful, because the error number will go up. There'll be many more expression that docstub doesn't allow. With that separation it's clear how much this PR reduces the error rate. |
## Description This updates docstub to the most recent version: 0.6.0. I've allowed for patch releases but excluded >=0.7.0, because those might introduce new features. By pinning we can more closely control our CI. After the discussion during our summit I removed the skimage specific array syntax (scientific-python/docstub@9d546a8). We opted to adopt the less ambiguous one from scikit-learn. This is the reason the number of reported errors goes up. There'll be many more expression that docstub doesn't allow. I recommend addressing these in a separate PR. So it's clearer how the allowed number of errors changes. This was brought up in #8011 (comment). @mkcor feel free to merge this if you approve. ## Checklist <!-- Before pull requests can be merged, they should provide: --> - A descriptive but concise pull request title - [Docstrings for all functions](https://github.com/numpy/numpydoc/blob/main/doc/example.py) - [Unit tests](https://scikit-image.org/docs/dev/development/contribute.html#testing) - A gallery example in `./doc/examples` for new features - [Contribution guide](https://scikit-image.org/docs/dev/development/contribute.html) is followed ## Release note For maintainers and optionally contributors, please refer to the [instructions](https://scikit-image.org/docs/stable/development/contribute.html#documenting-changes) on how to document this PR for the release notes. ```release-note ... ```
|
Oh, thanks, @lagru! Somehow I checked the docstub version when running it locally, but it didn't strike me as outdated... 🤦♀️ |
| footprint : ndarray | ||
| The neighborhood expressed as an ndarray of 1's and 0's. | ||
| out : ([P,] M, N) array (same dtype as input) | ||
| out : ndarray of shape ([P,] M, N), same dtype as input `image` |
There was a problem hiding this comment.
@lagru I took the liberty of changing 'array' to 'ndarray' because the rest of the docstring mentions ndarray's everywhere else!
There was a problem hiding this comment.
Good thinking but that's not the logical along which this should be evaluated.
- "ndarray" must be an actual NumPy array
- "array" can be any array, for example a Dask array
- "array-like" is anything that can be coerced into an array
With that and looking at the implementation:
- image -> "array-like"
- mask -> must support memoryview, probably best to use "ndarray" here
- out -> same
- mask -> same
...
I'm afraid when fixing errors, we'll also have to do the harder work determining the correct type by looking at the implementation.
I think we can use types that are narrower than what the implementation supports. That can often be a good idea. But there's a trade-off:
- Excessively narrow types can be very annoying to fix downstream and defeat the flexibility of Python
- Broad types might catch less errors.
I think where we currently are. We should almost never use "array". Use array-like if asarray (or similar) is used on the parameter, otherwise use ndarray.
There was a problem hiding this comment.
@lagru ok, so it's a NumPy array because it's the output image (return value out); I fixed line 465 on this occasion.
| Returns | ||
| ------- | ||
| haar_features : (n_features,) ndarray of int or float | ||
| haar_features : ndarray of shape (n_features,) and dtype (int or float) |
There was a problem hiding this comment.
I hope you interpreted properly this (n_features,) prefix!
There was a problem hiding this comment.
This reads a bit oddly. I expect the term following shape to be a tuple, but not the term following dtype.
There was a problem hiding this comment.
Sorry, typo, I meant to write "I hope I interpreted this properly!"
@stefanv but your comment bears on the second part, right? (int or float) is not a tuple... Even if it were parsed in plain Python (which it shouldn't be... since we are in the docstring...), it can't be a tuple because it contains no comma.
There was a problem hiding this comment.
Maybe using ... dtype (int | float) might be a bit clearer?
Also technically this dtype might be wrong. I'm pretty sure that the actual dtypes are sized ones from NumPy...
There was a problem hiding this comment.
Maybe that's okay and docstub should simply translate int to numpy.int_ in context of dtype? Though I'd prefer not to hardcode that skimage-specific decision. So maybe we can handle that as part of a hook.
There was a problem hiding this comment.
Personally I have no preference between | and or.
Right, it looks like the dtype should be int; more specifically, this haar_features array is actually a "labels" array...
There was a problem hiding this comment.
Sorry, I'm not sure what you're referring to:
Maybe that's okay and docstub should simply translate
int...
okay to use dtype int? In general? With this doctype in particular?
| footprint : ndarray of dtype (int or float) | ||
| The neighborhood expressed as a 2-D array of 1's and 0's. | ||
| out : 2-D array (integer or float), optional | ||
| out : ndarray of shape (H, W, N) and dtype (int or float), optional |
There was a problem hiding this comment.
According to the Returns section, out is a 3D array, not 2D.
| Returns | ||
| ------- | ||
| haar_features : (n_features,) ndarray of int or float | ||
| haar_features : ndarray of shape (n_features,) and dtype (int or float) |
There was a problem hiding this comment.
This reads a bit oddly. I expect the term following shape to be a tuple, but not the term following dtype.
In plain English, yes, for sure. Sometimes the parameter description is (partly) redundant with the typing and we have sentences ending with "... array of dtype int or float." But since the typing is meant to be both human-readable and machine-parseable, we need a grammar to express what takes precedence, as @lagru already clarified: means either an array or an integer. Otherwise, would you suggest completely changing the syntax and, for example, reserving if we mean "array or int" and if we mean two possible dtypes? Considering all the work @lagru has put into docstub, I wouldn't ask for such changes. Plus, I don't think the parentheses read badly (but that's a matter of brain diversity), they encapsulate what they should, after all. |
lagru
left a comment
There was a problem hiding this comment.
As pointed out in #8011 (comment) I think we need to look at the implementation a bit. A lot of the current type descriptions might be very vague or flat out wrong because nothing really forced us to pay close attention.
Yes. I realized immediately that I couldn't just do search and replace's... I'm trying to strike a compromise though. I'm not double-checking everything super thoroughly; but I'm not 'changing stuff randomly' or 'just guessing' (I'll go back to the one instance of |
|
Along what @stefanv would like, I'm seeing that scikit-learn don't use parentheses, for example: weights : ndarray of shape (n_samples,) or (n_features,), default=None(from https://github.com/scikit-learn/scikit-learn/blob/66fbe2dcf91d8f9ff3e827856fb2e4f90b57ddc2/sklearn/utils/sparsefuncs.py#L112). Maybe it could/should be allowed when there's no possible ambiguity? But that wouldn't apply to dtypes. |
## Description Follows PR #8011. ## Checklist <!-- Before pull requests can be merged, they should provide: --> - A descriptive but concise pull request title - [Docstrings for all functions](https://github.com/numpy/numpydoc/blob/main/doc/example.py) - [Unit tests](https://scikit-image.org/docs/dev/development/contribute.html#testing) - A gallery example in `./doc/examples` for new features - [Contribution guide](https://scikit-image.org/docs/dev/development/contribute.html) is followed ## Release note For maintainers and optionally contributors, please refer to the [instructions](https://scikit-image.org/docs/stable/development/contribute.html#documenting-changes) on how to document this PR for the release notes. ```release-note ... ``` --------- Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu> Co-authored-by: Lars Grüter <lagru+github@mailbox.org> Co-authored-by: Lars Grüter <lagru@mailbox.org>
Description
Follows on #7853.
Checklist
./doc/examplesfor new featuresRelease note
For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.