Update docs for img_as_ubyte#1229
Conversation
The line in `img_as_ubyte` needs correction; ubyte cannot hold negative data. From a quick experiment it seems like both `img_as_uint` and `img_as_ubyte` clip negative values (and not shift them to the positive domain).
There was a problem hiding this comment.
Agreed about the prior removal, but I think these two lines are still correct and reasonably relevant.
There was a problem hiding this comment.
I found it confusing to say that the result only has positive values, since there is no way it can have negative values.
There was a problem hiding this comment.
I also find this confusing, because the output image will have only positive values also if the input data is negative. Why mention this?
There was a problem hiding this comment.
I prefer @almarklein's re-wording on this; it's seems clear and concise.
There was a problem hiding this comment.
All right, I'm convinced!
|
afaic this is ready to be merged, but since @JDWarner had some concerns, I'll let him weigh in again. But 👍 from me! |
|
It seems like we need to come to a consensus on how to handle the cc @scikit-image/core |
|
@blink1073: I don't actually see what conflicts there are with regards to |
|
I can't for the life of me figure out the rhyme or reason to this: In [6]: t1 = np.array([-1, 100], dtype=np.int8)
In [7]: img_as_ubyte(t1)
Out[7]: array([ 0, 201], dtype=uint8)
In [8]: img_as_uint(t1)
Out[8]: array([ 0, 51603], dtype=uint16)
In [9]: t2 = np.array([-1, 100], dtype=np.int16)
In [10]: img_as_uint(t2)
Out[10]: array([ 0, 200], dtype=uint16)
In [11]: img_as_ubyte(t2)
Out[11]: array([0, 0], dtype=uint8) |
|
Looking at it in context, I might say that it clips to the new range, and then scales it? |
|
Yup. Converting from int to uint clips negative values and scales from the positive range of the input to the positive range of the output. So, for example, 127 for That last example looks confusing, but that's just what happens when your resolution decreases. Unfortunately, that data loss is unavoidable. |
What is the point in this? |
|
I think the clipping issue is different from the scaling issue. The thought was that negative values are unusual, but sometimes necessary. If you have negative values, though, you should deal with them based on the context. (Note, I wasn't the one who made the decision about clipping negative values, but I think it's the right decision.) Forgetting about negative values, scaling is important so we can have reliable intensity values. If your example was a conversion to |
|
@tonysyu, I agree with the principle, provided we explain it better and show examples, of how it works and how to get other desired behaviour. |
|
@tonysyu I see. Maybe this scaling thing should also be mentioned in the docs? I can make the change here if you like. |
|
@almarklein I'd favor adding the scaling in, yes. Ultimately, I'm less and less enamored with our data policy and would favor overhauling it in the medium term. One suggestion would be to call the functions "rescale_to_{uint,ubyte,float}" to make it clearer than we are quite definitely messing with the data. But for now, these doc changes are very useful. |
|
@almarklein: Explicit mention of the rescaling is a good idea. @jni: I like the idea of renaming to |
|
Ok, I mentioned the scaling in the docs for all integer conversions. |
|
Ok I'm merging this. I'll also open a new issue for renaming the functions rescale_to_*. Thanks @almarklein! |
|
@jni I'd like to hear more about your concerns with the dtype policy. The current policy satisfies two demands:
The problem lies on the boundary of two functions in a pipeline--users prefer types to be conserved, and in some cases the algorithm "doesn't care". In other cases, though, the algorithms do care. Perhaps it would help if we had a "de facto" internal representation (floating point), but somewhere along the line you have to make compromises w.r.t. copies. |
|
@stefanv except that because of our data policy, many of our functions unnecessarily convert to uint8 or float then uint8 just because of lazy programming. Morphology is the major example. There is no reason for erosion to not work on all datatypes, other than at some point it was more convenient to code it that way in Cython. But I think this is being careless with users' data. We should be explicit in our policy that we will (henceforth) make an effort to not rescale, clip, or otherwise mess with users' data (as our img_as functions do), unless required by the algorithm. Once we have that policy, we can review PRs with it in mind. (mean filter is an example where I would consider a float image is necessary.) |
|
@jni I do not understand why you call it careless / lazy programming. Some time ago it was not easy to program Cython functions for multiple data types. Fused types have not always been there... |
|
@ahojnnes sorry, I know it's hard. But it's far from impossible. (For example, you can always write several functions and switch in a Python wrapper.) At least it might have been considered if my proposed data policy had been in place. Anyway I don't know enough about skimage pre-2012 to judge these developments — I'm just saying in the future, I want to encourage contributors to spend extra effort to avoid conversions. |
|
Which is a good proposal that I strongly support. But writing separate functions for each datatype is and imo was not feasible. |
|
Maintenance burden is a nasty business. If we can find ways of doing this |
|
What about http://docs.cython.org/src/userguide/fusedtypes.html ? |
|
Fused types provide one component of the solution. But it by no means |
The line in
img_as_ubyteneeds correction; ubyte cannot hold negative data. From a quick experiment it seems like bothimg_as_uintandimg_as_ubyteclip negative values (and not shift them to the positive domain).