Add image sequence of in-vivo human cornea to data registry.#6858
Add image sequence of in-vivo human cornea to data registry.#6858jni merged 5 commits intoscikit-image:mainfrom
Conversation
| ----- | ||
| See info under `in-vivo-cornea-spots.tif` at | ||
| https://gitlab.com/scikit-image/data/-/blob/master/README.md#data. | ||
|
|
There was a problem hiding this comment.
I'd prefer a reference for the cornea or palisades of Vogt here. Perhaps
Goldberg MF, Bron AJ (1982). "Limbal palisades of Vogt". Transactions of the American Ophthalmological Society. 80: 155–71. PMC 1312261. PMID 7182957.
There was a problem hiding this comment.
I don't agree; otherwise we would need to rethink the data registry altogether.
Arguably, this kind of reference would go in the tutorial where the image is being used (or possibly under the in-vivo-cornea-spots.tif entry at https://gitlab.com/scikit-image/data/-/blob/master/README.md#data).
There was a problem hiding this comment.
PS: I can elaborate on my thought process if you wish 😉
There was a problem hiding this comment.
Please do. I'm confused why adding a reference here wouldn't be a good thing. The docstring explains what the palisades of Vogt are so why not back that up with a proper resource? Note that the resource is not about this particular data file or its origin but about the palisades of Vogt (in case that's the source of the confusion).
otherwise we would need to rethink the data registry altogether.
What would we need to rethink?
There was a problem hiding this comment.
Ok, I'll break it down into multiple posts.
First, I consider the purely {technical, formal, legal} perspective. Since our data registry is distributed as part of a software package, we want to document the data provenance (origin, author, license, ...).
Do you think it's {fine, better, unnecessary, [other]} to duplicate this info wrt https://gitlab.com/scikit-image/data/-/blob/master/README.md? (Btw, maybe this should be a permalink for each entry.) I find it unnecessary, but you could convince me otherwise...
There was a problem hiding this comment.
Then, of course, the sample datasets are meant to be used. We want to document their shapes and sizes, so that users know what to expect (memory usage, suitable image readers, ...).
There was a problem hiding this comment.
Most importantly, you might think at this point, we want to document what the image is showing. Yes, obviously! The single-line short summary should describe what the image dataset is about.
Based on the current content of our data registry, I would argue that we want rather high-level descriptions, because:
- our data registry offers a collection a sample image datasets which are (well, not completely random, I'll admit that!) unrelated to one another, very diverse in {category, subject, field}, for users to load easily and try out (resp. build) image processing tasks (resp. workflows);
- my impression is that we are not (planning on) curating a (structured, consistent, field-specific) database of images;
- I see the
skimage.datasubmodule as data-centric (as opposed to contextualized), which is why "the resource [not being] about this particular data file or its origin" is an argument against including this resource; - I find that 'context & motivation' belong to narrative documentation and/or any place discussing applications (where the data are actually being used).
To be clear, I won't spend energy opposing the inclusion of a link to an article; I don't think it hurts that bad. It's just that people who are particularly interested in the object will search for the term and find the exact same article by themselves; the link makes the description less self-contained (without much justification: in scikit-image, as far as I know, we don't use any info derived from this article...?); and... all of the above.
There was a problem hiding this comment.
I'm not sure I follow your reasoning entirely but I'm okay with not including the reference to the paper. But you are right in the sense that it is not a required reference to use the function. And if anyone is curious maybe they can find the reference included elsewhere or discover it on their own. :)
More off-topic:
I find that 'context & motivation' belong to narrative documentation and/or any place discussing applications (where the data are actually being used).
I'm not sure I disagree but I don't think a gallery example (do you mean this by narrative documentation) is a good replacement for a well written short summary or notes section in the docstring. Gallery examples are often not directly focused on the function itself. So while some context and motivation is often included in the example's introduction it's not as visible there and might not be directly relevant. Especially if it is distributed among X different gallery examples.
It probably depends on personal preference and in this case on what precisely we understand as "context & motivation". 😄
There was a problem hiding this comment.
I don't think a gallery example (do you mean this by narrative documentation) is a good replacement for a well written short summary or notes section in the docstring.
I don't think either! Btw, yes, by narrative documentation I mean the tutorial-like gallery examples. They are not supposed to be a replacement for a docstring; we are dealing with different levels here (I mean 'levels' as in high-level vs. low-level, general-purpose vs. domain-specific, ...).
Gallery examples are often not directly focused on the function itself.
Exactly, here you go. That's why, when writing the docstring, we shouldn't assume a very particular use of the function itself. The second sentence in the abstract of the paper you were pointing to goes like: "Our clinical studies indicate that they are more discrete in younger and in more heavily pigmented individuals, ..." This is super specific (e.g., clinical trials, comparison between younger and older individuals, ...) and completely unrelated to the image dataset we are sharing...
Especially if it is distributed among X different gallery examples.
Exactly, again. The docstring of the function itself is not the right place to discuss a) very very specific applications of the function nor (worse) b) applications which are not even related to the actual function (here, data). As far as we know, the dataset at hand isn't part of the research presented in that paper.
It probably depends on personal preference
Well, I would argue that there is a rationale. At first, I thought that pointing to that paper would be somewhat equivalent to linking to the 'astronaut' entry in the dictionary for the data.astronaut() function. So I thought 'why not?' (replying rather no, with the argument that it's pretty much unnecessary; see above).
Then I realized it was even worse (logically speaking), because at least the 'astronaut' entry in the dictionary is generic so, even if unnecessary, it's not properly off-topic. But this paper investigates the potential role of the palisades of Vogt in the aging/diseases of the cornea... (??)
Btw, the current (WIP) gallery example which uses this image of 'palisades of Vogt' is about dark spots caused by dust on the microscope's mirror (!!) -- which proves that, indeed and as you have pointed out yourself, what's found in a function's docstring should have nothing to do (a priori) with the introductory section of a given tutorial/paper.
what precisely we understand as "context & motivation"
Sure, there can be different levels of "context & motivation;" in a scientific setting, by "context & motivation" I mean basically the first section of a research paper.
| ----- | ||
| See info under `in-vivo-cornea-spots.tif` at | ||
| https://gitlab.com/scikit-image/data/-/blob/master/README.md#data. | ||
|
|
There was a problem hiding this comment.
I don't agree; otherwise we would need to rethink the data registry altogether.
Arguably, this kind of reference would go in the tutorial where the image is being used (or possibly under the in-vivo-cornea-spots.tif entry at https://gitlab.com/scikit-image/data/-/blob/master/README.md#data).
lagru
left a comment
There was a problem hiding this comment.
Looks good. Thank you Marianne!
If a pull requests's body (description) contains a code block like this ```release-note A short summary. ``` its content is used instead of the PR title in the release notes. See scikit-image#6858 for a PR to which the appropriate block was added.
Description
This dataset is meant to be used in #6853.
@ana42742 @decorouz once this PR is merged, we'll be able to load the image data just with
image_seq = data.cornea().fyi @JulesScholler
Checklist
./doc/examples(new features only)./benchmarks, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py.doc/release/release_dev.rst.example, to backport to v0.19.x after merging, add the following in a PR
comment:
@meeseeksdev backport to v0.19.xrun-benchmarklabel. To rerun, the labelcan be removed and then added again. The benchmark output can be checked in
the "Actions" tab.
Release note
Summarize the introduced changes in the code block below in one or a few sentences. The
summary will be included in the next release notes automatically: