Performance enhancement: use CUDA in ImageItem#1466
Performance enhancement: use CUDA in ImageItem#1466j9ac9k merged 42 commits intopyqtgraph:masterfrom
Conversation
this should improve performance under windows
|
Qt4 :shakes-fist-at-cloud: |
|
Oh, testing has happened on 3 systems so far. Linux, Pop!OS, cuda:10.2, GeForce GTX 1660 Ti Mobile Windows 10 Pro, cuda:11.1, Quadro P1000 Windows 10 Pro, cuda:11.1, GeForce RTX 2060 SUPER |
|
@outofculture thanks for the PR! I am 100% onboard w/ making cupy an optional dependency. There is another PR that @campagnola has been working on to improve image item performance ( #669 ) ...which it appears you are the author of some commits on that PR. Can you expand on how the two PRs compare? |
|
Yeah, @j9ac9k I originally forked off of Luke's work there, but then I sorta ignored what was there already and went off in a different direction. I was able to rebase and isolate this changeset to just the cupy and pre-alloc stuff. I would totally be willing to group them together into a single PR to ease the testing burden; I would just need to confer with @campagnola to move #669 out of wip status. |
|
I'll try and ping him; but I don't want to hold off for too long in case we don't hear back. Thanks again for this PR! |
|
Woo! Checks pass! I have no idea why macos was failing there for a few runs, and I did nothing to try to fix anything, but the build fails went away on their own. 🤷🏽♀️ |
|
hi @outofculture we've had some intermittent macOS failures on openGL related examples/tests, that I'm pretty sure are CI system related; so yeah, don't sweat those... I'll try and dive into this tonight... I'm likely going to have some pretty simple/dumb questions, is this your preferred medium to communicate on; or should we just stick here to this pull request/issue tracker? |
|
@j9ac9k github is fine. Depending on exactly when you look at it, I could be available on a higher-bandwidth chat platform ( live text, audio or video ) of your choosing. I'll dm my contact info. |
pyqtgraph/functions.py
Outdated
| data = data.astype(data.dtype.newbyteorder('=')) | ||
| if not dtype.isnative: | ||
| weaveDtype = dtype.newbyteorder('=') | ||
| # p = np.poly1d([scale, -offset*scale]) |
There was a problem hiding this comment.
can we remove commented out lines?
There was a problem hiding this comment.
I didn't know if this project had a strong preference to keep old bits of code like this around, but I would be happy to drop unused comments.
There was a problem hiding this comment.
I don't mind leaving some old bits like this in, but probably should add a note as to why you're wanting to keep them around
pyqtgraph/functions.py
Outdated
| ---------- | ||
| lut : ndarray | ||
| Either cupy or numpy arrays are accepted, though this function has | ||
| problems in cupy on windows with some versions of the cuda toolkit. |
There was a problem hiding this comment.
is this with cuda versions < 11.0? ...probably should try and be a bit more specific here (doesn't have to be exhaustive, but documenting what's been observed would be good).
|
All comments should be addressed. |
|
There is something the CI doesn't like on this PR...given it's failing on multiple platforms I need to investigate further. @outofculture can you replicate the issue on ubuntu/pyside2/python3.9? |
|
Yeah, the tests fail for me in 3.9, but I'm getting failures in master as well. On master, On my branch, the segfault and timeout reproduced the first time I ran the tests, but since then, I can't get it to happen again. I do also get the same failures as master. |
|
Those errors I'm not too concerned about ( I'll run locally on some various platforms (due to my hidpi laptop display the test suite fails horribly on windows), and see if I'm able to replicate... |
|
Arg; I can't reproduce the segfault. I've tried repeatedly re-running the tests, switching branches, rebuilding the env, and rebooting. It happened the first time for me, but never since. The stack in the logs doesn't look familiar to me at all; it's not in any of the code I worked on. |
|
For the record, I'm not suggesting the issue is one of your code, but with segfaults, from what I understand, changes in one place can reveal issues elsewhere. |
Conflicts: README.md - same line, unrelated changes examples/VideoSpeedTest.py - new pyside6 import v. cupy and args at same line
|
Thanks so much for this feature @outofculture ! |
cupy is a mostly-compatible drop-in replacement for numpy that performs its work on a CUDA-enabled GPU. This PR lets
ImageItem.setImageaccept an image on either substrate, and otherwise behaves identically.Also, windows memory allocation requests are slow, so pre-allocating buffers for processing improves performance ( for both cpu and gpu ). Since writing this, I've identified other places that could benefit from this treatment, but every bit helps.
The
examples/VideoSpeedTest.pyscript was improved to accept command line arguments and to test the cuda-enabled processing.Only one cupy function was found to be anything other than identical (
cupy.takedoes not supportmode="clip"and needed to be clipped explicitly ), but behavior will of course need to be verified everywhere. Thankfully, this PR shouldn't significantly alter the way numpy-based images are processed, so we can safely leave cuda as an optional/experimental feature while we feel out how it behaves on more diverse systems than I was able to use for my tests.In testing, we found one issue. On windows systems with CUDA Toolkit < 11.1, an int16-dtype image with a lookup table will be incorrectly processed ( the result gets entirely mapped to 255; it whites out ). I didn't know where to document or enforce this requirement.
This also begs the question of what else would benefit from becoming cupy/numpy-agnostic, but we should leave that until we've proven this small slice of functionality.