Skip to content

implement rescaleData_blocked#1648

Merged
outofculture merged 4 commits intopyqtgraph:masterfrom
pijyoi:rescale_blocked
Apr 6, 2021
Merged

implement rescaleData_blocked#1648
outofculture merged 4 commits intopyqtgraph:masterfrom
pijyoi:rescale_blocked

Conversation

@pijyoi
Copy link
Copy Markdown
Contributor

@pijyoi pijyoi commented Mar 21, 2021

This PR continues the rescaleData_blocked() function implementation spun off from #1617.
A standalone test file depending only on numpy has been put temporarily in benchmarks/time_rescale.py.

Some improvements have been made to the benchmarking script:

  1. tests both uint8 and float32
  2. ensures that there are values outside both clipping bounds
    • this exercises any branch prediction that may be taking place

A slowdown in the fits_int32 codepath was found

  1. for int32 data types, the clip limits provided to the clip function should be integer-valued. otherwise, it will be slower, particularly so on Windows. Probably different versions of the ufunc get selected based on its input arguments.

From #1641 (comment), it was found that:

  1. on Linux, numpy 1.20 runs faster than 1.17, probably due to SIMD implemented on 1.20.
  2. on macOS, numpy 1.20 and 1.17 are equally fast
  3. using umath.minimum(umath.maximum()) instead of umath.clip() is only an improvement on Windows. It is in fact a pessimization to use it on Linux and macOS. (For Linux, rescaleData_blocked() already takes umath.clip() codepath)
    These findings have not been incorporated yet, pending more verification.

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 21, 2021

@j9ac9k , could I trouble you to run benchmarks/time_rescale.py on macOS? Once without changes and once more after making 'darwin' to belong to fast_float_clip (within the file benchmarks/time_rescale.py)

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Mar 21, 2021

@j9ac9k , could I trouble you to run benchmarks/time_rescale.py on macOS? Once without changes and once more after making 'darwin' to belong to fast_float_clip (within the file benchmarks/time_rescale.py)

f() dtype alpha runtime (standard) runtime (darwin in fast_float_clip)
RD uint8 0.1 0.436254924 0.44594586799999997
RDB uint8 0.1 0.2032192060000001 0.19190403199999995
RD uint8 0.25 0.42308679299999996 0.41383199300000006
RDB uint8 0.25 0.18000498899999995 0.19028552100000007
RD float32 0.1 0.40843685100000027 0.41462677700000006
RDB float32 0.1 0.452220552 0.39034072399999964
RD float32 0.25 0.3910386429999999 0.40077267699999997
RDB float32 0.25 0.45443687899999974 0.39704211099999975

Hope this helps!

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 21, 2021

Hope this helps!

From the RDB float32 values, macOS benefits from using umath.clip over minmax. (also seen in #1641 (comment))

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Mar 21, 2021

Hope this helps!

From the RDB float32 values, macOS benefits from using umath.clip over minmax. (also seen in #1641 (comment))

I saw in the files you added you put just a copy/paste of the console output, if you'd like that (so you don't have to manipulate the markdown syntax used to generate tables) let me know.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Mar 22, 2021

so... I may have come across a bug here in the most random of manners... I happened to be on this branch when evaluating some potentially old/stale issues to see if I can close them, and then I came across issue #590 which has a really short and sweet example

import pyqtgraph as pg
from pyqtgraph.Qt import QtCore, QtGui
import numpy

plt = pg.image(numpy.random.normal(size=(100,100))*10**50) 

if __name__ == '__main__':
    import sys
    if sys.flags.interactive != 1 or not hasattr(pg.QtCore, 'PYQT_VERSION'):
        pg.QtGui.QApplication.exec_()

which generated the following exception:

Traceback (most recent call last):
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/graphicsItems/HistogramLUTItem.py", line 224, in imageChanged
    h = self.imageItem().getHistogram()
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/graphicsItems/ImageItem.py", line 515, in getHistogram
    mn = self._xp.nanmin(stepData).item()
AttributeError: 'float' object has no attribute 'item'
Traceback (most recent call last):
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/graphicsItems/ImageItem.py", line 462, in paint
    self.render()
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/graphicsItems/ImageItem.py", line 451, in render
    fn.makeARGB(image, lut=lut, levels=levels, output=self._processingBuffer)
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/functions.py", line 1287, in makeARGB
    data = rescaleData_blocked(data, scale/rng, minVal, dtype=dtype)
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/functions.py", line 1106, in rescaleData_blocked
    it = np.nditer([data_in, data_out],
TypeError: Iterator operand or requested dtype holds references, but the REFS_OK flag was not enabled

To be fair, an exception was generated on the master branch as well... but the image displayed...

Traceback (most recent call last):
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/graphicsItems/HistogramLUTItem.py", line 224, in imageChanged
    h = self.imageItem().getHistogram()
  File "/Users/ogi/Developer/pyqtgraph/pyqtgraph/graphicsItems/ImageItem.py", line 515, in getHistogram
    mn = self._xp.nanmin(stepData).item()
AttributeError: 'float' object has no attribute 'item'

While experimenting here, an error is generated on the master and rescale_blocked branches when the exponent is 20, but no error occurs if the exponential value is 19.

Not sure this is actually an error here with this implementation, but something that caught my attention.

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 22, 2021

change

plt = pg.image(numpy.random.normal(size=(100,100))*10**50) 

to

plt = pg.image(numpy.random.normal(size=(100,100))*1e50) 

and it will work.

In [89]: (np.array([1., 2.])*10**50).dtype
Out[89]: dtype('O')

In [90]: (np.array([1., 2.])*1e50).dtype
Out[90]: dtype('float64')

In [94]: (np.array([1., 2.])*(2**64-1)).dtype
Out[94]: dtype('float64')

In [95]: (np.array([1., 2.])*(2**64)).dtype
Out[95]: dtype('O')

Probably has to do with the multiplicand being an integer that no longer fits within 64-bits and then some implementation details leak out.

@NilsNemitz
Copy link
Copy Markdown
Contributor

dtype('O') is 'object'. Numpy is desperate enough to pack oversized python ints into the array.

Do you think dtype('O') should maybe raise a helpful error message?
Because (np.array([1., 2.])*(2**64)) gives
array([1.8446744073709552e+19, 3.6893488147419103e+19], dtype=object)
which looks fine (except for the dtype), and it would be annoying to debug why the array prints, but doesn't plot.

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 22, 2021

For info, I can get the same behavior as master by adding the "refs_ok" flag to nditer. i.e. Image displays.
Whether that causes any side-effects, I wouldn't know.

@NilsNemitz
Copy link
Copy Markdown
Contributor

If that doesn't break anything for the rest of your code, then adding the "refs_ok" flag seems reasonable to me.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Mar 22, 2021

@pijyoi if you ever decide to teach some numpy training course sometime, please let me know, I would sign up.

I would have never thought to differentiate between 10**50 and 1e50

In terms of refs_ok if it were any other part of the library I would say we should probably enable it, but this part of the library we make some decisions based on numpy dtypes here, and I'm not sure having it work with object dtypes is in our best interest.

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 25, 2021

It is now a fair comparison between rescaleData() and rescaleData_blocked().
Previously, rescaleData() was being penalized by using a slower clipping path.

Note: if testing on a laptop, remember to plug in the power.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Mar 25, 2021

@pijyoi since you're not on your our slack, I couldn't thank you directly there, so instead I'm going to post some completely off-topic to this issue post pull request; to explicitly thank you for your contributions that have made the 0.12.0 release pretty awesome. If you're interested in hoping on our slack, let me know and I'll send you an invite.

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 28, 2021

Timings for time_rescale.py executed on the same machine, so the timings are directly comparable.
Numpy version is relevant.
Overall, nditer implementation is a win.

WSL2 numpy 1.20.2 Windows numpy 1.20.2
RD uint8 0.1 0.276985 0.579306
RDB uint8 0.1 0.106553 0.299663
RD uint8 0.25 0.26142 0.865689
RDB uint8 0.25 0.107774 0.302405
RD float32 0.1 0.250691 0.537989
RDB float32 0.1 0.123221 0.441421
RD float32 0.25 0.250870 0.826956
RDB float32 0.25 0.120945 0.746094
WSL2 numpy 1.17.4 Windows numpy 1.17.5
RD uint8 0.1 0.367519 0.573480
RDB uint8 0.1 0.157658 0.389136
RD uint8 0.25 0.478767 0.861280
RDB uint8 0.25 0.154246 0.481840
RD float32 0.1 0.351139 0.532844
RDB float32 0.1 0.232889 0.439551
RD float32 0.25 0.463790 0.821704
RDB float32 0.25 0.347609 0.733802

@pijyoi
Copy link
Copy Markdown
Contributor Author

pijyoi commented Mar 29, 2021

I have removed benchmarks/time_rescale.py.
It is included below (although no longer standalone.)

import time
import numpy as np
from pyqtgraph.functions import rescaleData, rescaleData_blocked

shape = 3072, 3072
data_uint8 = np.random.randint(256, size=shape, dtype=np.uint8)
data_float32 = np.random.random(shape).astype(np.float32)
niters = 10

for data in [data_uint8, data_float32]:
    for alpha in [0.1, 0.25]:
        if data.dtype.kind == 'u':
            minval, maxval = alpha * 255, (1 - alpha) * 255
        else:
            minval, maxval = alpha, 1 - alpha
        scale = 255 / (maxval - minval)
        offset = minval
        params = data, scale, offset, np.uint8

        t0 = time.perf_counter()
        for _ in range(niters):
            output = rescaleData(*params)
        t1 = time.perf_counter()
        print('RD ', data.dtype, alpha, t1 - t0)

        t2 = time.perf_counter()
        for _ in range(niters):
            output = rescaleData_blocked(*params)
        t3 = time.perf_counter()
        print('RDB', data.dtype, alpha, t3 - t2)

@pijyoi pijyoi marked this pull request as ready for review March 29, 2021 01:08
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Apr 2, 2021

@pijyoi this PR now has some merge conflicts; would you mind resolving them?

@pijyoi pijyoi force-pushed the rescale_blocked branch from 29c9870 to 11ad335 Compare April 2, 2021 04:52
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Apr 2, 2021

This LGTM, I'm going to give @outofculture some time to look over this.

@outofculture
Copy link
Copy Markdown
Contributor

Of course this is broken for CuPy (ugh; the lack of cupy CI is once again felt).

I looked into it, and it's not trivial to fix ( see this bug ). Best we can do for now is reference that bug and use the old code with CuPy.

Copy link
Copy Markdown
Contributor

@outofculture outofculture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it, and it's not trivial to fix

Though feel free to prove me wrong in this respect!

return data


def rescaleData_blocked(data_in, scale, offset, dtype=None, clip=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the need for CuPy compatibility, this new version should always be used, yes? This code is meant to completely replace the old version, for all behaviors. Let's make that more explicit, and name things accordingly. We should bring over the docstring at the same time, and we should give some guidance as to which version of rescaleData a user should use.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, np.nditer would be hardly worth implementing for Cupy. It would be an anti-pattern for CUDA to process block by block.

During the initial implementation, it was indeed meant to be a replacement. However, I can guess from the optional arguments that many functions in pyqtgraph must have grown from their original implementation to add additional functionality. e.g. for rescaleData(), clip argument must have been added for the levels + lut combine usage in ImageItem.py. Having to add support for all these optional functionalities makes it more difficult to optimize for specific use-cases that do have bottlenecks.

rng = maxVal-minVal
rng = 1 if rng == 0 else rng
data = rescaleData(data, scale/rng, minVal, dtype=dtype)
data = rescaleData_dispatch(xp, data, scale/rng, minVal, dtype, nanMask is not None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't also use the new function for pyqtgraph/graphicsItems/ImageItem.py:431. Is this a case where the new code is legitimately not appropriate? I don't see how that's the case, seeing as it's nearly identical data to what applyLookupTable/makeARGB is dealing with.

I definitely agree that protecting narrow use-case optimizations is something we need to do in this library. Your new version of the code should be the current first choice of any user trying to rescale data. It's completely functionally equivalent, so far as I can tell, and generally performs better. Trying to preemptively protect the code from later pessimizations is still a good idea, but let's do it while still giving the users a best first choice.

Long-term, I think the right way to protect ( here and elsewhere ) is with continuous benchmarking.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess more explicitly, I recommend you take over the rescaleData function with rescaleData_dispatch ( probably also restoring the signature and just copying the xp = ... code ).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess more explicitly, I recommend you take over the rescaleData function with rescaleData_dispatch

I have taken over rescaleData() and made _rescaleData_nditer() a private helper function.

You didn't also use the new function for pyqtgraph/graphicsItems/ImageItem.py:431

That's the case where only 256 elements get scaled. But due to the above change, all uses of rescaling within pyqtgraph now go through the same rescaleData().

@outofculture outofculture merged commit aa57c7a into pyqtgraph:master Apr 6, 2021
@pijyoi pijyoi deleted the rescale_blocked branch April 7, 2021 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants