Add image sequence of in-vivo human cornea to data registry. by mkcor · Pull Request #6858 · scikit-image/scikit-image

mkcor · 2023-03-28T13:03:08Z

Description

This dataset is meant to be used in #6853.

@ana42742 @decorouz once this PR is merged, we'll be able to load the image data just with image_seq = data.cornea().

fyi @JulesScholler

Checklist

Docstrings for all functions
Gallery example in ./doc/examples (new features only)
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8
Descriptive commit messages (see below)

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
There is a bot to help automate backporting a PR to an older branch. For
example, to backport to v0.19.x after merging, add the following in a PR
comment: @meeseeksdev backport to v0.19.x
To run benchmarks on a PR, add the run-benchmark label. To rerun, the label
can be removed and then added again. The benchmark output can be checked in
the "Actions" tab.

Release note

Summarize the introduced changes in the code block below in one or a few sentences. The
summary will be included in the next release notes automatically:

Add new image sequence `skimage.data.palisades_of_vogt`
showing in-vivo tissue of the palisades of Vogt.

skimage/data/_fetchers.py

lagru · 2023-03-30T13:03:52Z

skimage/data/_fetchers.py

+    -----
+    See info under `in-vivo-cornea-spots.tif` at
+    https://gitlab.com/scikit-image/data/-/blob/master/README.md#data.
+


I'd prefer a reference for the cornea or palisades of Vogt here. Perhaps

Goldberg MF, Bron AJ (1982). "Limbal palisades of Vogt". Transactions of the American Ophthalmological Society. 80: 155–71. PMC 1312261. PMID 7182957.

I don't agree; otherwise we would need to rethink the data registry altogether.

Arguably, this kind of reference would go in the tutorial where the image is being used (or possibly under the in-vivo-cornea-spots.tif entry at https://gitlab.com/scikit-image/data/-/blob/master/README.md#data).

PS: I can elaborate on my thought process if you wish 😉

Please do. I'm confused why adding a reference here wouldn't be a good thing. The docstring explains what the palisades of Vogt are so why not back that up with a proper resource? Note that the resource is not about this particular data file or its origin but about the palisades of Vogt (in case that's the source of the confusion).

otherwise we would need to rethink the data registry altogether.

What would we need to rethink?

Ok, I'll break it down into multiple posts.

First, I consider the purely {technical, formal, legal} perspective. Since our data registry is distributed as part of a software package, we want to document the data provenance (origin, author, license, ...).

Do you think it's {fine, better, unnecessary, [other]} to duplicate this info wrt https://gitlab.com/scikit-image/data/-/blob/master/README.md? (Btw, maybe this should be a permalink for each entry.) I find it unnecessary, but you could convince me otherwise...

Then, of course, the sample datasets are meant to be used. We want to document their shapes and sizes, so that users know what to expect (memory usage, suitable image readers, ...).

Most importantly, you might think at this point, we want to document what the image is showing. Yes, obviously! The single-line short summary should describe what the image dataset is about.

Based on the current content of our data registry, I would argue that we want rather high-level descriptions, because:

our data registry offers a collection a sample image datasets which are (well, not completely random, I'll admit that!) unrelated to one another, very diverse in {category, subject, field}, for users to load easily and try out (resp. build) image processing tasks (resp. workflows);

my impression is that we are not (planning on) curating a (structured, consistent, field-specific) database of images;

I see the skimage.data submodule as data-centric (as opposed to contextualized), which is why "the resource [not being] about this particular data file or its origin" is an argument against including this resource;

I find that 'context & motivation' belong to narrative documentation and/or any place discussing applications (where the data are actually being used).

To be clear, I won't spend energy opposing the inclusion of a link to an article; I don't think it hurts that bad. It's just that people who are particularly interested in the object will search for the term and find the exact same article by themselves; the link makes the description less self-contained (without much justification: in scikit-image, as far as I know, we don't use any info derived from this article...?); and... all of the above.

I'm not sure I follow your reasoning entirely but I'm okay with not including the reference to the paper. But you are right in the sense that it is not a required reference to use the function. And if anyone is curious maybe they can find the reference included elsewhere or discover it on their own. :)

More off-topic:

I find that 'context & motivation' belong to narrative documentation and/or any place discussing applications (where the data are actually being used).

I'm not sure I disagree but I don't think a gallery example (do you mean this by narrative documentation) is a good replacement for a well written short summary or notes section in the docstring. Gallery examples are often not directly focused on the function itself. So while some context and motivation is often included in the example's introduction it's not as visible there and might not be directly relevant. Especially if it is distributed among X different gallery examples.

It probably depends on personal preference and in this case on what precisely we understand as "context & motivation". 😄

I don't think a gallery example (do you mean this by narrative documentation) is a good replacement for a well written short summary or notes section in the docstring.

I don't think either! Btw, yes, by narrative documentation I mean the tutorial-like gallery examples. They are not supposed to be a replacement for a docstring; we are dealing with different levels here (I mean 'levels' as in high-level vs. low-level, general-purpose vs. domain-specific, ...).

Gallery examples are often not directly focused on the function itself.

Exactly, here you go. That's why, when writing the docstring, we shouldn't assume a very particular use of the function itself. The second sentence in the abstract of the paper you were pointing to goes like: "Our clinical studies indicate that they are more discrete in younger and in more heavily pigmented individuals, ..." This is super specific (e.g., clinical trials, comparison between younger and older individuals, ...) and completely unrelated to the image dataset we are sharing...

Especially if it is distributed among X different gallery examples.

Exactly, again. The docstring of the function itself is not the right place to discuss a) very very specific applications of the function nor (worse) b) applications which are not even related to the actual function (here, data). As far as we know, the dataset at hand isn't part of the research presented in that paper.

It probably depends on personal preference

Well, I would argue that there is a rationale. At first, I thought that pointing to that paper would be somewhat equivalent to linking to the 'astronaut' entry in the dictionary for the data.astronaut() function. So I thought 'why not?' (replying rather no, with the argument that it's pretty much unnecessary; see above).

Then I realized it was even worse (logically speaking), because at least the 'astronaut' entry in the dictionary is generic so, even if unnecessary, it's not properly off-topic. But this paper investigates the potential role of the palisades of Vogt in the aging/diseases of the cornea... (??)

Btw, the current (WIP) gallery example which uses this image of 'palisades of Vogt' is about dark spots caused by dust on the microscope's mirror (!!) -- which proves that, indeed and as you have pointed out yourself, what's found in a function's docstring should have nothing to do (a priori) with the introductory section of a given tutorial/paper.

what precisely we understand as "context & motivation"

Sure, there can be different levels of "context & motivation;" in a scientific setting, by "context & motivation" I mean basically the first section of a research paper.

mkcor

Thank you for your review, @lagru! Pushing my changes according to your suggestions.

skimage/data/_fetchers.py

mkcor · 2023-04-15T09:15:01Z

skimage/data/_fetchers.py

+    -----
+    See info under `in-vivo-cornea-spots.tif` at
+    https://gitlab.com/scikit-image/data/-/blob/master/README.md#data.
+


I don't agree; otherwise we would need to rethink the data registry altogether.

Arguably, this kind of reference would go in the tutorial where the image is being used (or possibly under the in-vivo-cornea-spots.tif entry at https://gitlab.com/scikit-image/data/-/blob/master/README.md#data).

lagru

Looks good. Thank you Marianne!

mkcor · 2023-04-26T12:44:47Z

Thanks for reviewing again, @lagru.

@jni “final boarding call 🚄” label 😉

If a pull requests's body (description) contains a code block like this ```release-note A short summary. ``` its content is used instead of the PR title in the release notes. See scikit-image#6858 for a PR to which the appropriate block was added.

Add image sequence of in-vivo human cornea to data registry

6e7a5ad

lagru reviewed Mar 30, 2023

View reviewed changes

lagru added the 👶 type: New feature label Apr 6, 2023

mkcor added 2 commits April 15, 2023 10:57

Merge branch 'main' into add-cornea-data

f230a7d

Rename cornea image into palisades_of_vogt

32f1b3c

mkcor commented Apr 15, 2023

View reviewed changes

Comply with PEP 257

9698693

lagru approved these changes Apr 25, 2023

View reviewed changes

mkcor added this to the 0.22 milestone Jun 3, 2023

Merge branch 'main' into add-cornea-data

0783ec1

jni merged commit f2fde3d into scikit-image:main Jun 5, 2023

lagru mentioned this pull request Jun 9, 2023

Rework generate_release_notes.py and add PR summary parsing #6961

Merged

lagru added the 🏆 type: Highlight Highlight in next release notes label Oct 3, 2023

Uh oh!

Conversation

mkcor commented Mar 28, 2023 • edited by lagru Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

For reviewers

Release note

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lagru Apr 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkcor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lagru left a comment

Choose a reason for hiding this comment

Uh oh!

mkcor commented Apr 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkcor commented Mar 28, 2023 •

edited by lagru

Loading

lagru Apr 22, 2023 •

edited

Loading