Skip to content

Fix errors reported by docstub (II).#8011

Merged
lagru merged 22 commits intoscikit-image:mainfrom
mkcor:docstub-fixes-II
Jan 13, 2026
Merged

Fix errors reported by docstub (II).#8011
lagru merged 22 commits intoscikit-image:mainfrom
mkcor:docstub-fixes-II

Conversation

@mkcor
Copy link
Copy Markdown
Member

@mkcor mkcor commented Jan 8, 2026

Description

Follows on #7853.

Checklist

Release note

For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.

Fix array doctype syntax & other small docstub fixes

@mkcor mkcor requested a review from lagru January 8, 2026 12:24
@mkcor mkcor added the 🧩 typing Type annotations, type checking, and stubs label Jan 8, 2026
@mkcor mkcor added 🔧 type: Maintenance Refactoring and maintenance of internals and removed 🔧 type: Maintenance Refactoring and maintenance of internals labels Jan 8, 2026
@mkcor
Copy link
Copy Markdown
Member Author

mkcor commented Jan 8, 2026

I'd like to get inputs on how I should fix this invalid syntax:

2-D array (integer or float)

The 'lazy' way would be:

2-D array, dtype int or float

But maybe the more consistent way (with the rest of the src/skimage/filters/rank/generic.py file) would be:

array of shape (M, N) and dtype int or float

(knowing that this syntax

2-D array of dtype int or float

is not valid docstub-wise). Any preferences?

@mkcor mkcor force-pushed the docstub-fixes-II branch from e218083 to 2b94664 Compare January 8, 2026 14:40
@lagru
Copy link
Copy Markdown
Member

lagru commented Jan 8, 2026

@mkcor, I just noticed that our typing workflow is still using an old docstub version 0.3! See

pip install docstub==0.3.0.post0

That explains your confusion and perhaps the discrepancies your are seeing in #8011 (comment).

After the discussion during our summit I removed the skimage specific array syntax (scientific-python/docstub@9d546a8). We opted to adopt the less ambiguous one from scikit-learn.

So we need to update docstub to the newest version. Do you want to do that in this PR? Otherwise I'm happy to so quickly in another one (tomorrow if I don't get a response).

@lagru
Copy link
Copy Markdown
Member

lagru commented Jan 8, 2026

Wait, I'll do the PR right now. Bumping the version separately is useful, because the error number will go up. There'll be many more expression that docstub doesn't allow. With that separation it's clear how much this PR reduces the error rate.

mkcor pushed a commit that referenced this pull request Jan 8, 2026
## Description

This updates docstub to the most recent version: 0.6.0. I've allowed for
patch releases but excluded >=0.7.0, because those might introduce new
features. By pinning we can more closely control our CI.

After the discussion during our summit I removed the skimage specific
array syntax
(scientific-python/docstub@9d546a8).
We opted to adopt the less ambiguous one from scikit-learn.

This is the reason the number of reported errors goes up. There'll be
many more expression that docstub doesn't allow. I recommend addressing
these in a separate PR. So it's clearer how the allowed number of errors
changes.

This was brought up in
#8011 (comment).

@mkcor feel free to merge this if you approve.

## Checklist

<!-- Before pull requests can be merged, they should provide: -->

- A descriptive but concise pull request title
- [Docstrings for all
functions](https://github.com/numpy/numpydoc/blob/main/doc/example.py)
- [Unit
tests](https://scikit-image.org/docs/dev/development/contribute.html#testing)
- A gallery example in `./doc/examples` for new features
- [Contribution
guide](https://scikit-image.org/docs/dev/development/contribute.html) is
followed

## Release note

For maintainers and optionally contributors, please refer to the
[instructions](https://scikit-image.org/docs/stable/development/contribute.html#documenting-changes)
on how to document this PR for the release notes.

```release-note
...
```
@mkcor
Copy link
Copy Markdown
Member Author

mkcor commented Jan 8, 2026

Oh, thanks, @lagru! Somehow I checked the docstub version when running it locally, but it didn't strike me as outdated... 🤦‍♀️

footprint : ndarray
The neighborhood expressed as an ndarray of 1's and 0's.
out : ([P,] M, N) array (same dtype as input)
out : ndarray of shape ([P,] M, N), same dtype as input `image`
Copy link
Copy Markdown
Member Author

@mkcor mkcor Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lagru I took the liberty of changing 'array' to 'ndarray' because the rest of the docstring mentions ndarray's everywhere else!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking but that's not the logical along which this should be evaluated.

  • "ndarray" must be an actual NumPy array
  • "array" can be any array, for example a Dask array
  • "array-like" is anything that can be coerced into an array

With that and looking at the implementation:

  • image -> "array-like"
  • mask -> must support memoryview, probably best to use "ndarray" here
  • out -> same
  • mask -> same
    ...

I'm afraid when fixing errors, we'll also have to do the harder work determining the correct type by looking at the implementation.

I think we can use types that are narrower than what the implementation supports. That can often be a good idea. But there's a trade-off:

  • Excessively narrow types can be very annoying to fix downstream and defeat the flexibility of Python
  • Broad types might catch less errors.

I think where we currently are. We should almost never use "array". Use array-like if asarray (or similar) is used on the parameter, otherwise use ndarray.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lagru ok, so it's a NumPy array because it's the output image (return value out); I fixed line 465 on this occasion.

Returns
-------
haar_features : (n_features,) ndarray of int or float
haar_features : ndarray of shape (n_features,) and dtype (int or float)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you interpreted properly this (n_features,) prefix!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit oddly. I expect the term following shape to be a tuple, but not the term following dtype.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, typo, I meant to write "I hope I interpreted this properly!"

@stefanv but your comment bears on the second part, right? (int or float) is not a tuple... Even if it were parsed in plain Python (which it shouldn't be... since we are in the docstring...), it can't be a tuple because it contains no comma.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using ... dtype (int | float) might be a bit clearer?

Also technically this dtype might be wrong. I'm pretty sure that the actual dtypes are sized ones from NumPy...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's okay and docstub should simply translate int to numpy.int_ in context of dtype? Though I'd prefer not to hardcode that skimage-specific decision. So maybe we can handle that as part of a hook.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I have no preference between | and or.

Right, it looks like the dtype should be int; more specifically, this haar_features array is actually a "labels" array...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure what you're referring to:

Maybe that's okay and docstub should simply translate int ...

okay to use dtype int? In general? With this doctype in particular?

footprint : ndarray of dtype (int or float)
The neighborhood expressed as a 2-D array of 1's and 0's.
out : 2-D array (integer or float), optional
out : ndarray of shape (H, W, N) and dtype (int or float), optional
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Returns section, out is a 3D array, not 2D.

Copy link
Copy Markdown
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lagru See question about and dtype (X or Y), which I think would read more intuitively as and dtype X or Y.

Returns
-------
haar_features : (n_features,) ndarray of int or float
haar_features : ndarray of shape (n_features,) and dtype (int or float)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit oddly. I expect the term following shape to be a tuple, but not the term following dtype.

@mkcor
Copy link
Copy Markdown
Member Author

mkcor commented Jan 9, 2026

@lagru See question about and dtype (X or Y), which I think would read more intuitively as and dtype X or Y.

In plain English, yes, for sure. Sometimes the parameter description is (partly) redundant with the typing and we have sentences ending with "... array of dtype int or float." But since the typing is meant to be both human-readable and machine-parseable, we need a grammar to express what takes precedence, as @lagru already clarified:

array of dtype float or int

means either an array or an integer. Otherwise, would you suggest completely changing the syntax and, for example, reserving | for containers and plain or for nested stuff? Like

array of dtype float | int

if we mean "array or int" and

array of dtype float or int

if we mean two possible dtypes? Considering all the work @lagru has put into docstub, I wouldn't ask for such changes. Plus, I don't think the parentheses read badly (but that's a matter of brain diversity), they encapsulate what they should, after all.

Copy link
Copy Markdown
Member

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out in #8011 (comment) I think we need to look at the implementation a bit. A lot of the current type descriptions might be very vague or flat out wrong because nothing really forced us to pay close attention.

@mkcor
Copy link
Copy Markdown
Member Author

mkcor commented Jan 9, 2026

As pointed out in #8011 (comment) I think we need to look at the implementation a bit. A lot of the current type descriptions might be very vague or flat out wrong because nothing really forced us to pay close attention.

Yes. I realized immediately that I couldn't just do search and replace's... I'm trying to strike a compromise though. I'm not double-checking everything super thoroughly; but I'm not 'changing stuff randomly' or 'just guessing' (I'll go back to the one instance of array I turned into ndarray -- I agree that, to be on the safe side, I should change it to array-like, but then we should do it in the entire file for all the very many similar instances).

@mkcor
Copy link
Copy Markdown
Member Author

mkcor commented Jan 9, 2026

Along what @stefanv would like, I'm seeing that scikit-learn don't use parentheses, for example:

weights : ndarray of shape (n_samples,) or (n_features,), default=None

(from https://github.com/scikit-learn/scikit-learn/blob/66fbe2dcf91d8f9ff3e827856fb2e4f90b57ddc2/sklearn/utils/sparsefuncs.py#L112). Maybe it could/should be allowed when there's no possible ambiguity? But that wouldn't apply to dtypes.

Copy link
Copy Markdown
Member

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mkcor. Clear improvement and as we discussed on Zulip, we can worry about getting dtypes correct at a later stage.

@lagru lagru merged commit bfded80 into scikit-image:main Jan 13, 2026
25 checks passed
@stefanv stefanv added this to the 0.27 milestone Jan 13, 2026
@mkcor mkcor deleted the docstub-fixes-II branch January 13, 2026 19:28
mkcor added a commit that referenced this pull request Jan 16, 2026
## Description

Follows PR #8011.

## Checklist

<!-- Before pull requests can be merged, they should provide: -->

- A descriptive but concise pull request title
- [Docstrings for all
functions](https://github.com/numpy/numpydoc/blob/main/doc/example.py)
- [Unit
tests](https://scikit-image.org/docs/dev/development/contribute.html#testing)
- A gallery example in `./doc/examples` for new features
- [Contribution
guide](https://scikit-image.org/docs/dev/development/contribute.html) is
followed

## Release note

For maintainers and optionally contributors, please refer to the
[instructions](https://scikit-image.org/docs/stable/development/contribute.html#documenting-changes)
on how to document this PR for the release notes.

```release-note
...
```

---------

Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🧩 typing Type annotations, type checking, and stubs 🔧 type: Maintenance Refactoring and maintenance of internals

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants