Skip to content

Performance enhancement: use CUDA in ImageItem#1466

Merged
j9ac9k merged 42 commits intopyqtgraph:masterfrom
outofculture:cupy-rebase
Jan 20, 2021
Merged

Performance enhancement: use CUDA in ImageItem#1466
j9ac9k merged 42 commits intopyqtgraph:masterfrom
outofculture:cupy-rebase

Conversation

@outofculture
Copy link
Copy Markdown
Contributor

cupy is a mostly-compatible drop-in replacement for numpy that performs its work on a CUDA-enabled GPU. This PR lets ImageItem.setImage accept an image on either substrate, and otherwise behaves identically.

Also, windows memory allocation requests are slow, so pre-allocating buffers for processing improves performance ( for both cpu and gpu ). Since writing this, I've identified other places that could benefit from this treatment, but every bit helps.

The examples/VideoSpeedTest.py script was improved to accept command line arguments and to test the cuda-enabled processing.

Only one cupy function was found to be anything other than identical ( cupy.take does not support mode="clip" and needed to be clipped explicitly ), but behavior will of course need to be verified everywhere. Thankfully, this PR shouldn't significantly alter the way numpy-based images are processed, so we can safely leave cuda as an optional/experimental feature while we feel out how it behaves on more diverse systems than I was able to use for my tests.

In testing, we found one issue. On windows systems with CUDA Toolkit < 11.1, an int16-dtype image with a lookup table will be incorrectly processed ( the result gets entirely mapped to 255; it whites out ). I didn't know where to document or enforce this requirement.

This also begs the question of what else would benefit from becoming cupy/numpy-agnostic, but we should leave that until we've proven this small slice of functionality.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Dec 12, 2020

Qt4 :shakes-fist-at-cloud:

@outofculture
Copy link
Copy Markdown
Contributor Author

outofculture commented Dec 12, 2020

Oh, testing has happened on 3 systems so far.

Linux, Pop!OS, cuda:10.2, GeForce GTX 1660 Ti Mobile

Windows 10 Pro, cuda:11.1, Quadro P1000

Windows 10 Pro, cuda:11.1, GeForce RTX 2060 SUPER

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Dec 12, 2020

@outofculture thanks for the PR! I am 100% onboard w/ making cupy an optional dependency.

There is another PR that @campagnola has been working on to improve image item performance ( #669 ) ...which it appears you are the author of some commits on that PR. Can you expand on how the two PRs compare?

@outofculture
Copy link
Copy Markdown
Contributor Author

Yeah, @j9ac9k I originally forked off of Luke's work there, but then I sorta ignored what was there already and went off in a different direction. I was able to rebase and isolate this changeset to just the cupy and pre-alloc stuff. I would totally be willing to group them together into a single PR to ease the testing burden; I would just need to confer with @campagnola to move #669 out of wip status.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Dec 12, 2020

I'll try and ping him; but I don't want to hold off for too long in case we don't hear back. Thanks again for this PR!

@outofculture
Copy link
Copy Markdown
Contributor Author

outofculture commented Jan 12, 2021

Woo! Checks pass! I have no idea why macos was failing there for a few runs, and I did nothing to try to fix anything, but the build fails went away on their own. 🤷🏽‍♀️

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Jan 12, 2021

hi @outofculture we've had some intermittent macOS failures on openGL related examples/tests, that I'm pretty sure are CI system related; so yeah, don't sweat those...

I'll try and dive into this tonight... I'm likely going to have some pretty simple/dumb questions, is this your preferred medium to communicate on; or should we just stick here to this pull request/issue tracker?

@outofculture
Copy link
Copy Markdown
Contributor Author

@j9ac9k github is fine. Depending on exactly when you look at it, I could be available on a higher-bandwidth chat platform ( live text, audio or video ) of your choosing. I'll dm my contact info.

data = data.astype(data.dtype.newbyteorder('='))
if not dtype.isnative:
weaveDtype = dtype.newbyteorder('=')
# p = np.poly1d([scale, -offset*scale])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove commented out lines?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know if this project had a strong preference to keep old bits of code like this around, but I would be happy to drop unused comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind leaving some old bits like this in, but probably should add a note as to why you're wanting to keep them around

----------
lut : ndarray
Either cupy or numpy arrays are accepted, though this function has
problems in cupy on windows with some versions of the cuda toolkit.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this with cuda versions < 11.0? ...probably should try and be a bit more specific here (doesn't have to be exhaustive, but documenting what's been observed would be good).

@outofculture
Copy link
Copy Markdown
Contributor Author

All comments should be addressed.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Jan 13, 2021

There is something the CI doesn't like on this PR...given it's failing on multiple platforms I need to investigate further. @outofculture can you replicate the issue on ubuntu/pyside2/python3.9?

@outofculture
Copy link
Copy Markdown
Contributor Author

Yeah, the tests fail for me in 3.9, but I'm getting failures in master as well.

On master, test_Viewbox is failing intermittently, and test_ErrorBarItem_defer_data is failing consistently.

On my branch, the segfault and timeout reproduced the first time I ran the tests, but since then, I can't get it to happen again. I do also get the same failures as master.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Jan 15, 2021

Those errors I'm not too concerned about (test_ErrorBarItem_defer_data has been intermittent if I remember right, haven't had any failures with `test_Viewbox before); the issues I'm concerned with are the timeouts/segfaults; historically, some segfaults have come up with small/unrelated changes in the code. Annoyingly, I have never been able to replicate them locally, they just happen on CI platforms periodically.

I'll run locally on some various platforms (due to my hidpi laptop display the test suite fails horribly on windows), and see if I'm able to replicate...

@outofculture
Copy link
Copy Markdown
Contributor Author

Arg; I can't reproduce the segfault. I've tried repeatedly re-running the tests, switching branches, rebuilding the env, and rebooting. It happened the first time for me, but never since. The stack in the logs doesn't look familiar to me at all; it's not in any of the code I worked on.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Jan 15, 2021

For the record, I'm not suggesting the issue is one of your code, but with segfaults, from what I understand, changes in one place can reveal issues elsewhere.

Conflicts:
	README.md - same line, unrelated changes
	examples/VideoSpeedTest.py - new pyside6 import v. cupy and args at same line
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Jan 20, 2021

Thanks so much for this feature @outofculture !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants