Skip to content

Fix dtype casting for Texture objects#2350

Merged
brisvag merged 32 commits intovispy:mainfrom
brisvag:fix/dtype-casting-texture
Sep 6, 2022
Merged

Fix dtype casting for Texture objects#2350
brisvag merged 32 commits intovispy:mainfrom
brisvag:fix/dtype-casting-texture

Conversation

@brisvag
Copy link
Copy Markdown
Collaborator

@brisvag brisvag commented Jul 4, 2022

This PR tries to streamline interacting with Texture objects by improving and adding some guards to the datatype casting/downcasting necessary to update data on the GPU.

On current main, setting the data of a texture (or a visual) to some new data with a different dtype will either fail silently by blindly giving the wrong dtype to the gpu, or result in messy tracebacks. To fix this, I improved the casting logic to make sure that we always end up with max 32 bit datatypes, and that new data will be coerced to the previous dtype of the Texture.

The new tests should clarify the kind of behaviour that this PR allows.

Copy link
Copy Markdown
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general idea of what you're doing here, but man does it scare me. I'm really worried arrays are going to be copied when we don't want them to be. I see the tests fail so I'll save future reviewing for when those pass.

)
if should_cast_to_f32(data.dtype):
data = data.astype(np.float32)
data = downcast_to_32(data, copy=copy)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there is no way to include this in your updated textures so it isn't needed here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed here because Image saves self._data, which should be saved in the "correct" version, I think.

I did remove it in #2351.

@brisvag brisvag force-pushed the fix/dtype-casting-texture branch from 5474d4a to d9b31d0 Compare July 21, 2022 16:05
brisvag and others added 3 commits July 21, 2022 18:09
Co-authored-by: melonora <michiel.vierdag@embl.de>
Co-authored-by: melonora <michiel.vierdag@embl.de>
Co-authored-by: melonora <michiel.vierdag@embl.de>
Copy link
Copy Markdown
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last comments.

# Apply to texture
if should_cast_to_f32(vol.dtype):
vol = vol.astype(np.float32)
vol = downcast_to_32(vol, copy=copy)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using the CPU-scaled texture, don't we not need this downcasting to happen? I mean, if we were given 64-bit floats, the CPU-based scaling would deal with the conversion to a GPU-compatible dtype, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, the problem is that the self._data is kept for later and used again... If we don't keep a "correct" (downcast) version of the data, then we'll get a million warnings every time the texture needs to be rebuilt (like when clim gets changed). I find it more correct to simply hold the data was was actually sent to the gpu, and only warn once.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is true for both volume and image)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was actually sent to the gpu

But in the CPU-scaled texture case this casting should never happen and shouldn't be needed so no warning was going to be generated anyway. Now, with this PR, it would produce an unnecessary warning and make a copy of the data.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's see if tests pass without that casting.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things seem to be doing fine!

@brisvag brisvag force-pushed the fix/dtype-casting-texture branch from 6be2252 to 3753e98 Compare July 26, 2022 09:32
# update left panes rolling upwards
noise = np.roll(noise, 1, axis=0)
image1.set_data(noise)
image1.set_data(noise, copy=True)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed to fix a test failure; the reason why it was working before, is because we were doing unnecessary copies which resulted in copy=False not mattering here. But now that we respect the copy argument better, this example was failing because noise was getting changed between this line and then next.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. So one of the Visuals (the image?) was modifying the noise array? Doesn't that suggest that it wasn't copying internally when it should have because the copying was necessary to not modify the input data, right?

Copy link
Copy Markdown
Collaborator Author

@brisvag brisvag Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda the opposite: note that by default copy=False, and, despite that, noise was clearly getting copied or the example would have crashed. With the current changes, the copy kwarg is getting respected, so without copy=True we get the failure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So wait, with copy=False (the default), who is modifying noise in this example? Shouldn't the Visual copy if it is going to modify the data? Or does copy=False in these Visuals that it will modify the data inplace no matter what?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So wait, with copy=False (the default), who is modifying noise in this example?

image1

Shouldn't the Visual copy if it is going to modify the data? Or does copy=False in these Visuals that it will modify the data inplace no matter what?

It will modify in place no matter what if it can.

I think the confusion here comes from 2 different use cases for specifying copy=True:

  1. "Please copy it because I might change it externally before you send it to the gpu and I want you to send THIS EXACT data" (this is the use specified in the docs)
  2. "Please copy it before altering it because I will need it elsewhere"

This is case 2.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that in your screencast that the bottom two boxes are the ones that don't display at all, but those are only Isocurves, not the ImageVisuals created in the example.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, you're right, thanks for double checkin... I could swear I saw the same error. I must be wrong though, I can't reproduce anymore. Maybe I was on this branch and though it was main 🤦

The error above is indeed happening on main without any changes, but happens regularly if I rerun the example in the same ipython session. Definitely a bug, and definitely unrelated.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said this before, but I think that I really found the spot now xD

With some good-old print-debugging, I singled out this PR's equivalent of this line (as you expected, @djhoese):

if clim[0] != clim[1]:
data -= clim[0]
data *= 1.0 / (clim[1] - clim[0])

This is where data was being modified. I simply added an explicit copy just before, in cases where we want to modify and a copy was not already made.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looking at the diff now I see how we raised an error before this PR if updating couldn't be done without copying. I think copying when we have to is a reasonable assumption, but we may have to triple check the docstrings for the ImageVisual and the textures. Looks good to me though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...except for the test failures 😉

@brisvag brisvag requested a review from djhoese July 28, 2022 08:52
@brisvag
Copy link
Copy Markdown
Collaborator Author

brisvag commented Aug 1, 2022

Dare I say... it's fixed? o.o

Comment on lines +721 to +730
def test_texture_set_data_different_dtype(input_dtype, output_dtype):
data2D = np.random.rand(20, 20).astype(input_dtype)
tex2D = Texture2D(data2D)
tex2D[:10] = np.array(1, dtype=output_dtype)
tex2D.set_data(data2D.astype(output_dtype))

data3D = np.random.rand(20, 20, 20).astype(input_dtype)
tex3D = Texture3D(data3D)
tex2D[:10] = np.array(1, dtype=output_dtype)
tex3D.set_data(data3D.astype(output_dtype))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test just checking that a certain data type can be ingested by the Texture? Is there something we should be checking after it is set to confirm validity or is the fact that it doesn't raise an exception good enough?

Otherwise, the 2D versus 3D could/should be made into another pytest parameter and then an if statement to handle if a Texture2D or Texture3D should be made.

Also, typo, tex2D[:10] in the 3D block of code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just testing that no error arises. The image and volume tests make sure that the rendering goes as expected as well, since Texture does not provide a way to get back the data from the GPU afaik. Updated the test!

Comment on lines +383 to +388
# out of bounds should wrap around cleanly
new_data = np.array([[0, 128]], dtype=np.float64)
image.set_data(new_data)
render = c.render()
assert np.allclose(render[left], black)
assert np.allclose(render[right], black)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of bounds in the sense that 128 is over the 127 limit? Shouldn't 128 then be clipped to 127 and made white?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because:

In [1]: import numpy as np

In [2]: np.array(128, dtype=np.int8)
Out[2]: array(-128, dtype=int8)

We can add extra machinery to clip instead, if you think that would be better, hto I'm not sure how that would work. This was intended more as a "do not do crazy stuff like passing the wrong dtype" kind of thing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a CPU scaled ImageVisual and this new_data array is a 64-bit float. The initial data dtype doesn't matter here, does it? The CPU-scaling should be scaling the data, based on the clims, to a uint8 texture and sending it to the GPU.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda the opposite; the CPUScaledTexture always sends np.float32 to the GPU, because the scaling happens on the CPU side. So the flow in the test above is:

  • image last time had int8 data
  • image.set_data(): data is saved as-is float in image._data
  • image._build_texture()
  • texture.scale_and_set_data(): here data is converted to the previous texture._data_dtype (int8) so we can correctly interpret/build clims
  • texture._scale_data_on_cpu(): converts data to float32 if necessary

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in case if we wanted to do the clipping, it should happen in the scale_and_set_data step

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, so that casting to the old/original dtype is new behavior for CPU scaling from what I can tell. Lines like:

data = np.array(data, dtype=self._data_dtype, copy=copy)

didn't exist in the old version from what I can tell in the github diff. I'm ok with that change but it should probably be documented somewhere (in the ImageVisual or VolumeVisual or at least in the texture classes). Based on your comments in the code and your comment above I'm guessing the odd behavior that happens if we don't do this casting is that the limits (and interpolation?) are messed up?

I wasn't suggesting that we add clipping, I think there is clipping in the shader and I thought that's what this test was assuming/using to verify the output. It is actually wrapping/overflowing the floating point 128 64-bit float value to a -128 8-bit signed integer value. That makes sense. I think overall the confusing thing for me was not knowing that the data type is now preserved between .set_data calls. Again, that is fine, just not what I expected.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the odd behavior that happens if we don't do this casting is that the limits (and interpolation?) are messed up?

Yes, I wasn't very clear, my bad. I think it makes sense to clip as you say, actually. We could use np.iinfo and np.finfo to get min/max and clip before casting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think any clipping should happen on the GPU/shader. I'd like to avoid any unnecessary processing on the CPU. Especially since this won't apply to most cases where people are using the same data type and they probably have data in the proper range.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw the answer late, already tried an implementation :P See what you think about it. I agree that ideally you'd do clipping on the GPU, but with a CPU scaled texture we don't really have a choice, it's either we clip it, or the result is gonna be nonsense to the user.

Comment on lines +534 to +539
# out of bounds should wrap around cleanly
new_data = np.array([[[0, 128]]], dtype=np.float64)
volume.set_data(new_data)
render = c.render()
assert np.allclose(render[left], black)
assert np.allclose(render[right], black)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same argument as for the ImageVisual test, shouldn't 128 get clipped to 127 and be white on the output?

@brisvag
Copy link
Copy Markdown
Collaborator Author

brisvag commented Sep 6, 2022

Ok, I think everything was addressed and I have @djhoese's green light to merge. Doing it now!

@brisvag brisvag merged commit a8d9c0a into vispy:main Sep 6, 2022
@brisvag brisvag deleted the fix/dtype-casting-texture branch September 6, 2022 09:52
Comment on lines +313 to +314
if self._data_dtype is None:
data.dtype == self._data_dtype
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._data_dtype is None:
data.dtype == self._data_dtype
if self._data_dtype is None:
self._data_dtype = data.dtype

@brisvag sorry for digging up this old code, but I got here looking through warnings that come up in tests. Should this be as above? Otherwise I'm not sure what this line is meant to be doing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof... It's been some time, not sure I remember much about these changes. Your change definitely looks like the most likely intended code ^^' That probably means we have some test missing that should catch a problem here.

brisvag pushed a commit that referenced this pull request Jun 19, 2024
…2601)

I noticed this warning when running tests, and I think it traces back to
the change in this PR.

```
vispy/visuals/tests/test_colormap.py: 6 warnings
vispy/visuals/tests/test_image.py: 39 warnings
vispy/visuals/tests/test_scalable_textures.py: 4 warnings
  /Users/aandersoniii/src/vispy/vispy/gloo/texture.py:18: DeprecationWarning: finfo() dtype cannot be None. This behavior will raise an error in the future. (Deprecated in NumPy 1.25)
    info = np.finfo(dtype)
```

See #2350 (comment) for
some context.

I am partly guessing what this code meant to be doing, but here the
`_data_dtype` is set the first time `scale_and_set_data` is called.
@brisvag at your suggestion I am trying to think of a test that could
have caught this, but need to think about it a little more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants