Skip to content

Update docs for img_as_ubyte#1229

Merged
jni merged 2 commits intoscikit-image:masterfrom
almarklein:patch-1
Nov 25, 2014
Merged

Update docs for img_as_ubyte#1229
jni merged 2 commits intoscikit-image:masterfrom
almarklein:patch-1

Conversation

@almarklein
Copy link
Copy Markdown
Contributor

The line in img_as_ubyte needs correction; ubyte cannot hold negative data. From a quick experiment it seems like both img_as_uint and img_as_ubyte clip negative values (and not shift them to the positive domain).

The line in `img_as_ubyte` needs correction; ubyte cannot hold negative data. From a quick experiment it seems like both `img_as_uint` and `img_as_ubyte` clip negative values (and not shift them to the positive domain).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed about the prior removal, but I think these two lines are still correct and reasonably relevant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it confusing to say that the result only has positive values, since there is no way it can have negative values.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find this confusing, because the output image will have only positive values also if the input data is negative. Why mention this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer @almarklein's re-wording on this; it's seems clear and concise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'm convinced!

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage remained the same when pulling 9bccefb on almarklein:patch-1 into 7c66797 on scikit-image:master.

@jni
Copy link
Copy Markdown
Member

jni commented Nov 23, 2014

afaic this is ready to be merged, but since @JDWarner had some concerns, I'll let him weigh in again. But 👍 from me!

@blink1073
Copy link
Copy Markdown
Contributor

It seems like we need to come to a consensus on how to handle the int -> unint conversion in general. This has shown up in #1228, and our docs also mention that we "rescale" the values.
In C you get blind-sided when doing this conversion (i.e. -3 -> 65532). Our friend Matlab remaps 0 to the center of the uint scale.
There is no right solution, it depends on the source of the data in the image. At the very least, we should give an explicit default behaviour and some examples of how to handle different situations using exposure.rescale and friends.

cc @scikit-image/core

@tonysyu
Copy link
Copy Markdown
Member

tonysyu commented Nov 23, 2014

@blink1073: I don't actually see what conflicts there are with regards to int -> uint conversion. The dtype docs seem pretty clear about what happens when converting from int to uint. The issue in #1228 is, in my opinion, quite different: Conversion to uint was done out of laziness (I think that was my fault) more than anything else.

@blink1073
Copy link
Copy Markdown
Contributor

I can't for the life of me figure out the rhyme or reason to this:

In [6]: t1 = np.array([-1, 100], dtype=np.int8)

In [7]: img_as_ubyte(t1)
Out[7]: array([  0, 201], dtype=uint8)

In [8]: img_as_uint(t1)
Out[8]: array([    0, 51603], dtype=uint16)

In [9]: t2 = np.array([-1, 100], dtype=np.int16)

In [10]: img_as_uint(t2)
Out[10]: array([  0, 200], dtype=uint16)

In [11]: img_as_ubyte(t2)
Out[11]: array([0, 0], dtype=uint8)

@blink1073
Copy link
Copy Markdown
Contributor

Looking at it in context, I might say that it clips to the new range, and then scales it?

@tonysyu
Copy link
Copy Markdown
Member

tonysyu commented Nov 24, 2014

Yup. Converting from int to uint clips negative values and scales from the positive range of the input to the positive range of the output. So, for example, 127 for int8 maps to 255 in uint8; i.e. input max maps to output max, and everything in-between is scaled linearly.

That last example looks confusing, but that's just what happens when your resolution decreases. Unfortunately, that data loss is unavoidable.

@almarklein
Copy link
Copy Markdown
Contributor Author

Converting from int to uint clips negative values and scales from the positive range of the input to the positive range of the output. So, for example, 127 for int8 maps to 255 in uint8

What is the point in this?

>>> a = array([-128, 0, 127], 'int8')

# I could understand if it did this (scale to full range):
>>> img_as_ubyte(a)
array([0, 128, 255], dtype=uint8)

# And this would seem reasonable too (clip, and keep positive values the same):
>>> img_as_ubyte(a)
array([0, 0, 127], dtype=uint8)

# But the current behavior confuses me (clip and scale):
>>> img_as_ubyte(a)
array([  0,   0, 255], dtype=uint8)

@tonysyu
Copy link
Copy Markdown
Member

tonysyu commented Nov 24, 2014

I think the clipping issue is different from the scaling issue. The thought was that negative values are unusual, but sometimes necessary. If you have negative values, though, you should deal with them based on the context. (Note, I wasn't the one who made the decision about clipping negative values, but I think it's the right decision.)

Forgetting about negative values, scaling is important so we can have reliable intensity values. If your example was a conversion to uint16, would a simple clip be desirable? To me, values of 127 in a uint16 image should be treated as background noise (assuming 0 is background). Also, clipping would fail hard for conversion to float.

@blink1073
Copy link
Copy Markdown
Contributor

@tonysyu, I agree with the principle, provided we explain it better and show examples, of how it works and how to get other desired behaviour.

@almarklein
Copy link
Copy Markdown
Contributor Author

@tonysyu I see. Maybe this scaling thing should also be mentioned in the docs? I can make the change here if you like.

@jni
Copy link
Copy Markdown
Member

jni commented Nov 24, 2014

@almarklein I'd favor adding the scaling in, yes.

Ultimately, I'm less and less enamored with our data policy and would favor overhauling it in the medium term. One suggestion would be to call the functions "rescale_to_{uint,ubyte,float}" to make it clearer than we are quite definitely messing with the data. But for now, these doc changes are very useful.

@tonysyu
Copy link
Copy Markdown
Member

tonysyu commented Nov 25, 2014

@almarklein: Explicit mention of the rescaling is a good idea.

@jni: I like the idea of renaming to rescale_to_*.

@almarklein
Copy link
Copy Markdown
Contributor Author

Ok, I mentioned the scaling in the docs for all integer conversions.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.01%) when pulling 0e7daf4 on almarklein:patch-1 into 7c66797 on scikit-image:master.

@jni
Copy link
Copy Markdown
Member

jni commented Nov 25, 2014

Ok I'm merging this. I'll also open a new issue for renaming the functions rescale_to_*. Thanks @almarklein!

jni added a commit that referenced this pull request Nov 25, 2014
Update docs for img_as_ubyte
@jni jni merged commit 802b22a into scikit-image:master Nov 25, 2014
@almarklein almarklein deleted the patch-1 branch November 26, 2014 09:27
@stefanv
Copy link
Copy Markdown
Member

stefanv commented Nov 26, 2014

@jni I'd like to hear more about your concerns with the dtype policy. The current policy satisfies two demands:

  1. Easy pipeline building for users
  2. Optimal algorithmic choices w.r.t. the output (author always produces best suited output)

The problem lies on the boundary of two functions in a pipeline--users prefer types to be conserved, and in some cases the algorithm "doesn't care". In other cases, though, the algorithms do care.

Perhaps it would help if we had a "de facto" internal representation (floating point), but somewhere along the line you have to make compromises w.r.t. copies.

@jni
Copy link
Copy Markdown
Member

jni commented Nov 26, 2014

@stefanv except that because of our data policy, many of our functions unnecessarily convert to uint8 or float then uint8 just because of lazy programming. Morphology is the major example. There is no reason for erosion to not work on all datatypes, other than at some point it was more convenient to code it that way in Cython. But I think this is being careless with users' data. We should be explicit in our policy that we will (henceforth) make an effort to not rescale, clip, or otherwise mess with users' data (as our img_as functions do), unless required by the algorithm. Once we have that policy, we can review PRs with it in mind. (mean filter is an example where I would consider a float image is necessary.)

@ahojnnes
Copy link
Copy Markdown
Member

@jni I do not understand why you call it careless / lazy programming. Some time ago it was not easy to program Cython functions for multiple data types. Fused types have not always been there...

@jni
Copy link
Copy Markdown
Member

jni commented Nov 26, 2014

@ahojnnes sorry, I know it's hard. But it's far from impossible. (For example, you can always write several functions and switch in a Python wrapper.) At least it might have been considered if my proposed data policy had been in place. Anyway I don't know enough about skimage pre-2012 to judge these developments — I'm just saying in the future, I want to encourage contributors to spend extra effort to avoid conversions.

@ahojnnes
Copy link
Copy Markdown
Member

Which is a good proposal that I strongly support. But writing separate functions for each datatype is and imo was not feasible.

@stefanv
Copy link
Copy Markdown
Member

stefanv commented Nov 27, 2014

Maintenance burden is a nasty business. If we can find ways of doing this
without placing too much strain on developers, I'd be in favor, but I agree
with Johannes that that suggestion (multiple functions dispatched from
Python), e.g., is exactly the kind of thing we want to avoid. I am in favor
of building constructs that will allow developers to easily handle the
issue, hopefully at minimal cognitive overhead, all the while keeping the
code base simple.

@almarklein
Copy link
Copy Markdown
Contributor Author

@stefanv
Copy link
Copy Markdown
Member

stefanv commented Nov 27, 2014

Fused types provide one component of the solution. But it by no means
trivialises writing code compatible with all types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants