Update docs for img_as_ubyte by almarklein · Pull Request #1229 · scikit-image/scikit-image

almarklein · 2014-11-21T21:36:15Z

The line in img_as_ubyte needs correction; ubyte cannot hold negative data. From a quick experiment it seems like both img_as_uint and img_as_ubyte clip negative values (and not shift them to the positive domain).

The line in `img_as_ubyte` needs correction; ubyte cannot hold negative data. From a quick experiment it seems like both `img_as_uint` and `img_as_ubyte` clip negative values (and not shift them to the positive domain).

JDWarner · 2014-11-21T21:38:29Z

skimage/util/dtype.py

Agreed about the prior removal, but I think these two lines are still correct and reasonably relevant.

I found it confusing to say that the result only has positive values, since there is no way it can have negative values.

I also find this confusing, because the output image will have only positive values also if the input data is negative. Why mention this?

I prefer @almarklein's re-wording on this; it's seems clear and concise.

All right, I'm convinced!

coveralls · 2014-11-21T22:47:45Z

Coverage remained the same when pulling 9bccefb on almarklein:patch-1 into 7c66797 on scikit-image:master.

jni · 2014-11-23T12:47:17Z

afaic this is ready to be merged, but since @JDWarner had some concerns, I'll let him weigh in again. But 👍 from me!

blink1073 · 2014-11-23T13:00:11Z

It seems like we need to come to a consensus on how to handle the int -> unint conversion in general. This has shown up in #1228, and our docs also mention that we "rescale" the values.
In C you get blind-sided when doing this conversion (i.e. -3 -> 65532). Our friend Matlab remaps 0 to the center of the uint scale.
There is no right solution, it depends on the source of the data in the image. At the very least, we should give an explicit default behaviour and some examples of how to handle different situations using exposure.rescale and friends.

cc @scikit-image/core

tonysyu · 2014-11-23T21:33:01Z

@blink1073: I don't actually see what conflicts there are with regards to int -> uint conversion. The dtype docs seem pretty clear about what happens when converting from int to uint. The issue in #1228 is, in my opinion, quite different: Conversion to uint was done out of laziness (I think that was my fault) more than anything else.

blink1073 · 2014-11-24T04:12:51Z

I can't for the life of me figure out the rhyme or reason to this:

In [6]: t1 = np.array([-1, 100], dtype=np.int8)

In [7]: img_as_ubyte(t1)
Out[7]: array([  0, 201], dtype=uint8)

In [8]: img_as_uint(t1)
Out[8]: array([    0, 51603], dtype=uint16)

In [9]: t2 = np.array([-1, 100], dtype=np.int16)

In [10]: img_as_uint(t2)
Out[10]: array([  0, 200], dtype=uint16)

In [11]: img_as_ubyte(t2)
Out[11]: array([0, 0], dtype=uint8)

blink1073 · 2014-11-24T04:15:04Z

Looking at it in context, I might say that it clips to the new range, and then scales it?

tonysyu · 2014-11-24T04:41:22Z

Yup. Converting from int to uint clips negative values and scales from the positive range of the input to the positive range of the output. So, for example, 127 for int8 maps to 255 in uint8; i.e. input max maps to output max, and everything in-between is scaled linearly.

That last example looks confusing, but that's just what happens when your resolution decreases. Unfortunately, that data loss is unavoidable.

almarklein · 2014-11-24T08:57:27Z

Converting from int to uint clips negative values and scales from the positive range of the input to the positive range of the output. So, for example, 127 for int8 maps to 255 in uint8

What is the point in this?

>>> a = array([-128, 0, 127], 'int8')

# I could understand if it did this (scale to full range):
>>> img_as_ubyte(a)
array([0, 128, 255], dtype=uint8)

# And this would seem reasonable too (clip, and keep positive values the same):
>>> img_as_ubyte(a)
array([0, 0, 127], dtype=uint8)

# But the current behavior confuses me (clip and scale):
>>> img_as_ubyte(a)
array([  0,   0, 255], dtype=uint8)

tonysyu · 2014-11-24T15:14:03Z

I think the clipping issue is different from the scaling issue. The thought was that negative values are unusual, but sometimes necessary. If you have negative values, though, you should deal with them based on the context. (Note, I wasn't the one who made the decision about clipping negative values, but I think it's the right decision.)

Forgetting about negative values, scaling is important so we can have reliable intensity values. If your example was a conversion to uint16, would a simple clip be desirable? To me, values of 127 in a uint16 image should be treated as background noise (assuming 0 is background). Also, clipping would fail hard for conversion to float.

blink1073 · 2014-11-24T15:22:45Z

@tonysyu, I agree with the principle, provided we explain it better and show examples, of how it works and how to get other desired behaviour.

almarklein · 2014-11-24T15:38:20Z

@tonysyu I see. Maybe this scaling thing should also be mentioned in the docs? I can make the change here if you like.

jni · 2014-11-24T22:47:37Z

@almarklein I'd favor adding the scaling in, yes.

Ultimately, I'm less and less enamored with our data policy and would favor overhauling it in the medium term. One suggestion would be to call the functions "rescale_to_{uint,ubyte,float}" to make it clearer than we are quite definitely messing with the data. But for now, these doc changes are very useful.

tonysyu · 2014-11-25T05:21:14Z

@almarklein: Explicit mention of the rescaling is a good idea.

@jni: I like the idea of renaming to rescale_to_*.

almarklein · 2014-11-25T07:45:29Z

Ok, I mentioned the scaling in the docs for all integer conversions.

coveralls · 2014-11-25T08:23:51Z

Coverage increased (+0.01%) when pulling 0e7daf4 on almarklein:patch-1 into 7c66797 on scikit-image:master.

jni · 2014-11-25T23:39:35Z

Ok I'm merging this. I'll also open a new issue for renaming the functions rescale_to_*. Thanks @almarklein!

Update docs for img_as_ubyte

stefanv · 2014-11-26T23:21:08Z

@jni I'd like to hear more about your concerns with the dtype policy. The current policy satisfies two demands:

Easy pipeline building for users
Optimal algorithmic choices w.r.t. the output (author always produces best suited output)

The problem lies on the boundary of two functions in a pipeline--users prefer types to be conserved, and in some cases the algorithm "doesn't care". In other cases, though, the algorithms do care.

Perhaps it would help if we had a "de facto" internal representation (floating point), but somewhere along the line you have to make compromises w.r.t. copies.

jni · 2014-11-26T23:36:29Z

@stefanv except that because of our data policy, many of our functions unnecessarily convert to uint8 or float then uint8 just because of lazy programming. Morphology is the major example. There is no reason for erosion to not work on all datatypes, other than at some point it was more convenient to code it that way in Cython. But I think this is being careless with users' data. We should be explicit in our policy that we will (henceforth) make an effort to not rescale, clip, or otherwise mess with users' data (as our img_as functions do), unless required by the algorithm. Once we have that policy, we can review PRs with it in mind. (mean filter is an example where I would consider a float image is necessary.)

ahojnnes · 2014-11-26T23:48:13Z

@jni I do not understand why you call it careless / lazy programming. Some time ago it was not easy to program Cython functions for multiple data types. Fused types have not always been there...

jni · 2014-11-26T23:56:07Z

@ahojnnes sorry, I know it's hard. But it's far from impossible. (For example, you can always write several functions and switch in a Python wrapper.) At least it might have been considered if my proposed data policy had been in place. Anyway I don't know enough about skimage pre-2012 to judge these developments — I'm just saying in the future, I want to encourage contributors to spend extra effort to avoid conversions.

ahojnnes · 2014-11-26T23:59:11Z

Which is a good proposal that I strongly support. But writing separate functions for each datatype is and imo was not feasible.

stefanv · 2014-11-27T00:18:13Z

Maintenance burden is a nasty business. If we can find ways of doing this
without placing too much strain on developers, I'd be in favor, but I agree
with Johannes that that suggestion (multiple functions dispatched from
Python), e.g., is exactly the kind of thing we want to avoid. I am in favor
of building constructs that will allow developers to easily handle the
issue, hopefully at minimal cognitive overhead, all the while keeping the
code base simple.

almarklein · 2014-11-27T07:45:51Z

What about http://docs.cython.org/src/userguide/fusedtypes.html ?

stefanv · 2014-11-27T08:19:58Z

Fused types provide one component of the solution. But it by no means
trivialises writing code compatible with all types.

Update docs for img_as_ubyte

9bccefb

The line in `img_as_ubyte` needs correction; ubyte cannot hold negative data. From a quick experiment it seems like both `img_as_uint` and `img_as_ubyte` clip negative values (and not shift them to the positive domain).

JDWarner reviewed Nov 21, 2014
View reviewed changes

jni mentioned this pull request Nov 24, 2014

Protect exposure.histogram from integer overflow #1232

Merged

Update docs in dtype.py: mention scaling of values

0e7daf4

jni added a commit that referenced this pull request Nov 25, 2014

Merge pull request #1229 from almarklein/patch-1

802b22a

Update docs for img_as_ubyte

jni merged commit 802b22a into scikit-image:master Nov 25, 2014

jni mentioned this pull request Nov 25, 2014

Rename conversion functions rescale_to_* #1234

Open

almarklein deleted the patch-1 branch November 26, 2014 09:27

soupault mentioned this pull request Sep 5, 2016

misleading warning in img_as_uint #2215

Open

Uh oh!

Conversation

almarklein commented Nov 21, 2014

Uh oh!

JDWarner Nov 21, 2014

Choose a reason for hiding this comment

Uh oh!

almarklein Nov 21, 2014

Choose a reason for hiding this comment

Uh oh!

jni Nov 23, 2014

Choose a reason for hiding this comment

Uh oh!

tonysyu Nov 23, 2014

Choose a reason for hiding this comment

Uh oh!

JDWarner Nov 25, 2014

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 21, 2014

Uh oh!

jni commented Nov 23, 2014

Uh oh!

blink1073 commented Nov 23, 2014

Uh oh!

tonysyu commented Nov 23, 2014

Uh oh!

blink1073 commented Nov 24, 2014

Uh oh!

blink1073 commented Nov 24, 2014

Uh oh!

tonysyu commented Nov 24, 2014

Uh oh!

almarklein commented Nov 24, 2014

Uh oh!

tonysyu commented Nov 24, 2014

Uh oh!

blink1073 commented Nov 24, 2014

Uh oh!

almarklein commented Nov 24, 2014

Uh oh!

jni commented Nov 24, 2014

Uh oh!

tonysyu commented Nov 25, 2014

Uh oh!

almarklein commented Nov 25, 2014

Uh oh!

coveralls commented Nov 25, 2014

Uh oh!

jni commented Nov 25, 2014

Uh oh!

stefanv commented Nov 26, 2014

Uh oh!

jni commented Nov 26, 2014

Uh oh!

ahojnnes commented Nov 26, 2014

Uh oh!

jni commented Nov 26, 2014

Uh oh!

ahojnnes commented Nov 26, 2014

Uh oh!

stefanv commented Nov 27, 2014

Uh oh!

almarklein commented Nov 27, 2014

Uh oh!

stefanv commented Nov 27, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants