Skip to content

Enable warping with more than just doubles.#3253

Closed
hmaarrfk wants to merge 5 commits intoscikit-image:masterfrom
hmaarrfk:feature_fused_warping
Closed

Enable warping with more than just doubles.#3253
hmaarrfk wants to merge 5 commits intoscikit-image:masterfrom
hmaarrfk:feature_fused_warping

Conversation

@hmaarrfk
Copy link
Copy Markdown
Member

@hmaarrfk hmaarrfk commented Jul 7, 2018

Fixes #1287

With complex types too!!! That was more difficult than it sounds.... too difficult to do without help from the compiler.

This breaks things because people might have expected an implicit cast to float. Needs somebody to make a decision on how to deprecate this.

I think that calling anything like: img_as_* , or any kind of range checking should be considered a mistake without converting the data back to the original type, should be considered a mistake.
Personally, I think that the parameter should be ripped out. Instead, people should have to think about what type they want their data to be represented as.
Edit: I just think that types and units are different. The img_as* functions assume that the units you are I think that the img_as_* functions within algorithms is a mistake as it assumes a very specific unit system namely that uint8 is in ADC counts, and that floats are normalized ADC counts. That might be true for most cameras, but at the end of the day, it is just one unit system, and isn't universaly true, especially for other data that might have ND correlations.

I probably need some help writing tests for this, but for now it passes, so that is a good thing right??? maybe there is an implicit cast as float somewhere.

Edit: clarification on what I think should be considered a mistake.

Checklist

[It's fine to submit PRs which are a work in progress! But before they are merged, all PRs should provide:]

For reviewers

(Don't remove the checklist below.)

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch from afde931 to 689245e Compare July 7, 2018 02:36
@hmaarrfk hmaarrfk changed the title Enable warping with more than just doubles. WIP: Needs deprecation decision : Enable warping with more than just doubles. Jul 7, 2018
@jni
Copy link
Copy Markdown
Member

jni commented Jul 7, 2018

@hmaarrfk

Cool stuff!

I think that calling anything like: img_as_* , or any kind of range checking should be considered a mistake without converting the data back to the original type, should be considered a mistake.

No, repeated uint8 -> float -> uint8 -> float... conversions result in artifacts. See one demonstration here. As you know I am not a fan of our type and range conversions, but the conversion to float is there for a good reason and converting float -> uint8 should only happen at the very end of a pipeline — and we have no way of knowing when that is. So the user has to be responsible here, one way or another.

We also have to think about how to combine this with #3148...

Will have a chat with @stefanv about this, as well as the deprecation path.

for now it passes, so that is a good thing right???

Yes, very good! =)

@hmaarrfk

This comment has been minimized.

@hmaarrfk
Copy link
Copy Markdown
Member Author

hmaarrfk commented Jul 8, 2018 via email

@pep8speaks
Copy link
Copy Markdown

pep8speaks commented Jul 10, 2018

Hello @hmaarrfk! Thanks for updating the PR.

Line 521:34: E241 multiple spaces after ','
Line 522:22: E241 multiple spaces after ','
Line 522:39: E241 multiple spaces after ','
Line 546:34: E241 multiple spaces after ','
Line 547:22: E241 multiple spaces after ','
Line 547:39: E241 multiple spaces after ','

Comment last updated on October 25, 2018 at 02:48 Hours UTC

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jul 10, 2018

Codecov Report

Merging #3253 into master will decrease coverage by 0.94%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3253      +/-   ##
==========================================
- Coverage   87.77%   86.82%   -0.95%     
==========================================
  Files         323      341      +18     
  Lines       27138    27439     +301     
==========================================
+ Hits        23820    23825       +5     
- Misses       3318     3614     +296
Impacted Files Coverage Δ
skimage/transform/radon_transform.py 92.54% <100%> (ø) ⬆️
skimage/transform/_warps.py 99.48% <100%> (+0.01%) ⬆️
skimage/transform/tests/test_warps.py 100% <100%> (ø) ⬆️
skimage/io/_plugins/imread_plugin.py 69.23% <0%> (-15.39%) ⬇️
skimage/io/tests/test_fits.py 83.01% <0%> (-13.21%) ⬇️
skimage/io/_plugins/fits_plugin.py 81.13% <0%> (-9.44%) ⬇️
skimage/io/tests/test_imread.py 70.58% <0%> (-3.93%) ⬇️
skimage/viewer/tests/test_tools.py 97.34% <0%> (-2.66%) ⬇️
skimage/filters/tests/test_lpi_filter.py 87.5% <0%> (-2.09%) ⬇️
skimage/feature/texture.py 77.57% <0%> (-1.01%) ⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef1d91a...2126022. Read the comment docs.

@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch from 1cd0abe to 8262d73 Compare July 10, 2018 03:32
@hmaarrfk hmaarrfk changed the title WIP: Needs deprecation decision : Enable warping with more than just doubles. Enable warping with more than just doubles. Jul 10, 2018
@hmaarrfk
Copy link
Copy Markdown
Member Author

@jni: I just improved the the PR taking into consideration your feedback. Honestly, its up to you and the BDFL team with regards to how you want to deal with types, I just wanted to add flexibility to the algorithms which I think this PR achieves without imposing my point of view on the codebase.

  1. warping now allows you to specify the dtype of the output. This enables higher level algorithms to choose the precision of this "intermediate step".
  2. The default is still to upvert everything to float64 and not to preserve the range.
  3. The intermediate computation type is determined by the homography matrix. This is a weird hack. I wish I could just let the compiler decide what the appropriate intermediate type is. For example: uint8 computation can probably benefit from potential speedups from single point arithmetic. Though maybe that is "premature optimization". Specifically, this is commit: 8262d73 I'm kinda impartial to this so if you want, I can take it off.
  4. I backed away from complex interpolation. To avoid code duplication, it really needs to be done with C/C++ macros so that the compiler can choose the appropriate types. I don't know how to do in Cython. Apparently you can use Jinja or something, but maybe just writing straight C code is easier. I also don't know how to combine this with point Fix tv denoise alt #1 as you cannot have complex->float casting.

@hmaarrfk
Copy link
Copy Markdown
Member Author

hmaarrfk commented Sep 7, 2018

plot twist, this actuallly speeds up float64->float64 somehow, but slows down uint8->uint8 significantly. fun times!

@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch from 7c51f55 to a0e2a5d Compare September 7, 2018 11:12
Copy link
Copy Markdown
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmaarrfk I like this a lot! Thanks! Mostly just style and documentation comments.



def _local_binary_pattern(double[:, ::1] image,
def _local_binary_pattern(cnp.float64_t[:, ::1] image,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know. gardening. Raymond Hettinger taught me not to do it. I'll revert.

texture[i] = bilinear_interpolation(&image[0, 0], rows, cols,
r + rp[i], c + cp[i],
'C', 0)
bilinear_interpolation[cnp.float64_t, double, double](
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, what's going on here? Do you have a reference for this syntax?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://cython.readthedocs.io/en/latest/src/userguide/fusedtypes.html#indexing

fused type indexing. I didn't want to make changes to this function.

output_shape : tuple (rows, cols), optional
Shape of the output image generated (default None).
dtype :
The numric type of the output image.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is pretty woefully inaccurate...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs to be added?


ctypedef fused np_numeric:
np_real_numeric
np_complexes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this file, and have wanted it for a long time. Thank you!!!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be part of cython....

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you want this, you should take it out of this PR since I'm not sure where the performance drop is coming from here. Cython casting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about Cython building this in. Perhaps that should be your next PR? ;) In the meantime I'm happy to wait for you to hunt down the errant cast... And I'm also happy to merge, since ~90% of our warping is probably float-based.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i probably need a bit of time to polish it up. it will get there.

Copy link
Copy Markdown
Member

@soupault soupault Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that should be your next PR? ;

I'd also suggest writing a book on Practical Cython 😅. Seems like @hmaarrfk is a rare expert in this topic.

cdef inline void nearest_neighbour_interpolation(
np_real_numeric* image, Py_ssize_t rows, Py_ssize_t cols,
np_floats r, np_floats c, char mode, np_real_numeric cval,
np_real_numeric_out* out) nogil:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indent this 8 spaces to differentiate from a code block?

cdef inline double quadratic_interpolation(double x, double[3] f) nogil:
top = (1 - dc) * top_left + dc * top_right
bottom = (1 - dc) * bottom_left + dc * bottom_right
out[0] = <np_real_numeric_out> ((1 - dr) * top + dr * bottom)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this code now, it might not be as hard as I thought to generalise to nD... =) (Not related to this PR, I just meant to say that your code has made me wonder about this long-standing goal.


# To compute the variance features
cdef double sum_, var_, texture_i
cdef cnp.float64_t zero = 0
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do i have this? I'm going to try and take it off.

Copy link
Copy Markdown
Member Author

@hmaarrfk hmaarrfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs extra tests.

if output_shape is None:
out = np.empty_like(image, dtype=dtype)
else:
out = np.empty(shape=output_shape[:2], dtype=dtype)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what happens if you pass a 3D image and specify output shape.

@hmaarrfk
Copy link
Copy Markdown
Member Author

hmaarrfk commented Sep 7, 2018

@jni, I'm glad you like this, but the performance is about 30% worse on unint8, and 20% better on float64.

Now that I got asv running, I think I need to be a little more systematic about the development of this.

edit: specified worse

@soupault soupault added this to the 0.15 milestone Sep 7, 2018
@soupault soupault added the ⏩ type: Enhancement Improve existing features label Sep 7, 2018
Copy link
Copy Markdown
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the slowdown comes from situations such as when your image is of a different dtype than your cval. Then, a cast has to happen every time that the cval is assigned to the output image. I doubt there is a way to tell Cython: the image can be of any numeric type, but then we also want the cval to be the same type, otherwise cast it to the same type before continuing.

cdef inline Py_ssize_t fmin(Py_ssize_t one, Py_ssize_t two) nogil:
return one if one < two else two

ctypedef fused np_real_numeric_out:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm forcing cross type compilation too. so uint8->uint16, but more importantly uint8->float64

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about this.

@hmaarrfk
Copy link
Copy Markdown
Member Author

hmaarrfk commented Sep 7, 2018

@stefanv, the fact that np_real_numeric is the same type for cval and image means that they are necessarily the same time.

np_real_numeric_out allows the output to be of a different dtype following standard c type conversion.

However, I think you are correct in thinking that it is related to an extra cast somewhere.

@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch from e79e15a to 4b8a863 Compare September 10, 2018 00:45
@hmaarrfk
Copy link
Copy Markdown
Member Author

Final closing remarks on my experience addign fused types in this:

It seems that might actually have sped up uint8 by a little bit. A real benchmark on a dedicated computer would be necessary for this with interpolation order=0.

interpolation order=3 seems to be compute bound and not memory bound.
To speed that one up, we need to make use of the AVX instructions.
Unfortunately, I'm not a gcc guru so I don't know if setting export CFLAGS=-march=native was enough to enable it in cython.

It did help by about 20% in the CPU bound case of a 4096x4096 matrix.

This might hit the limitations of what cython can do. Not too sure.

This might be a good demo for Pythran or Numba to attack.

@hmaarrfk
Copy link
Copy Markdown
Member Author

I don't know why no speedup is seen with the float32 operations. Seems like you should be able to get a 2x speed up from going to float32 because SSE(2) can make 2x as many vectorized operations.

Currently, this is an
O(NxM) algorith, with memory order 1.

Maybe for vectorization, we can make this a
memory order (NxM) and computational order (NxM) algorithm.

Anyway, just some thoughts.

@jni jni mentioned this pull request Oct 8, 2018
5 tasks
@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch 2 times, most recently from 2126022 to f88a814 Compare October 11, 2018 19:38
@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch 4 times, most recently from ee4ad77 to 0ca1d93 Compare October 15, 2018 04:08
@jni jni mentioned this pull request Oct 19, 2018
9 tasks
Copy link
Copy Markdown
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ready to go? @hmaarrfk Were there still noticeable slowdowns?

Parameters
----------
x, y : double
x, y : np_float
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering our other discussion, do you think np_float is unambiguous here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha totally, i'll revert the notation!

@hmaarrfk
Copy link
Copy Markdown
Member Author

@stefanv I'm really not sure how to interpret them :/

       before           after         ratio
     [6f45aaa0]       [e4864e44]
     <master>         <feature_fused_warping>
+         149±9μs        302±100μs     2.03  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint16'>, 128, 0)
+         125±9μs          193±5μs     1.55  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint8'>, 128, 0)
+        375±20μs        581±200μs     1.55  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint8'>, 128, 1)
+         122±2μs          188±4μs     1.54  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.float64'>, 128, 0)
+     1.13±0.05ms       1.73±0.2ms     1.53  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint16'>, 128, 3)
+        433±10μs        647±100μs     1.49  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint16'>, 128, 1)
+         375±9μs         535±20μs     1.43  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint16'>, 128, 1)
+         372±6μs         500±30μs     1.34  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint8'>, 128, 1)
+         376±9μs         505±10μs     1.34  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float64'>, 128, 1)
+         153±3μs         201±20μs     1.31  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint8'>, 128, 0)
+        145±10μs          188±4μs     1.29  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float32'>, 128, 0)
+     1.02±0.01ms      1.21±0.03ms     1.19  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint16'>, 128, 3)
-        561±20ms         483±10ms     0.86  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float32'>, 4096, 1)
-         266±6ms          219±7ms     0.82  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float64'>, 4096, 0)
-        267±30ms          220±3ms     0.82  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.float64'>, 4096, 0)
-        258±20ms          196±3ms     0.76  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.float32'>, 4096, 0)
-      10.2±0.3ms       7.63±0.2ms     0.75  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint16'>, 1024, 0)
-        15.4±1ms       10.5±0.7ms     0.68  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float64'>, 1024, 0)
-      10.9±0.8ms       7.36±0.2ms     0.68  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint8'>, 1024, 0)
-        264±40ms          173±4ms     0.66  benchmarks_transform_warp.WarpSuite.time_to_float64(<class 'numpy.uint8'>, 4096, 0)
-         295±8ms          176±4ms     0.60  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float32'>, 4096, 0)
-      13.2±0.7ms       7.86±0.2ms     0.60  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.float32'>, 1024, 0)
-         286±6ms         150±10ms     0.52  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint16'>, 4096, 0)
-         279±9ms          121±8ms     0.43  benchmarks_transform_warp.WarpSuite.time_same_type(<class 'numpy.uint8'>, 4096, 0)

@hmaarrfk hmaarrfk force-pushed the feature_fused_warping branch from 9fa5823 to fc4e60a Compare October 25, 2018 02:48
@hmaarrfk
Copy link
Copy Markdown
Member Author

I think @jni is more ahead of me on this one. I just want to see the results of a command like

asv continuous

@hmaarrfk hmaarrfk closed this Feb 28, 2019
@jni
Copy link
Copy Markdown
Member

jni commented Feb 28, 2019

@stefanv see #3486, which aims to supersede this one.

@jni
Copy link
Copy Markdown
Member

jni commented Feb 28, 2019

@hmaarrfk those benchmarks look like small sizes might be dominated by the dispatch time, whereas big sizes benefit from type-specific computing?

What asv command did you use to run that?

@hmaarrfk
Copy link
Copy Markdown
Member Author

I wish I remembered.....

@hmaarrfk
Copy link
Copy Markdown
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⏩ type: Enhancement Improve existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants