Make auto downsample factor calculation more robust#2253
Make auto downsample factor calculation more robust#2253j9ac9k merged 4 commits intopyqtgraph:masterfrom
Conversation
The proposed changes enable ignoring the infinite and NaN values of `x` and avoiding a division by zero if `self.opts['autoDownsampleFactor'] == 0`.
|
Hi @StSav012 Thanks for the PR, this will definitely make an improvement. I'm a little concerned with the equality check at the bottom of your diff if width*self.opts['autoDownsampleFactor'] != 0.0 and np.isfinite(x1-x0):I know the code you're replacing was |
|
Hi @j9ac9k It's a good point. I'm not confident how to tell a sufficient tolerance from a bad one. After your question, I'd rather test that if width * self.opts['autoDownsampleFactor'] != 0.0:
ds_float = (x1 - x0) / (width * self.opts['autoDownsampleFactor'])
if np.isfinite(ds_float):
ds = int(max(1, int(ds_float)))And do we require both |
|
Also for singular values, I generally prefer to use |
| width = self.getViewBox().width() | ||
| if width != 0.0: | ||
| ds = int(max(1, int((x1-x0) / (width*self.opts['autoDownsampleFactor'])))) | ||
| if width != 0.0: # autoDownsampleFactor _should_ be > 1.0 |
There was a problem hiding this comment.
Hi @StSav012
I think for this equality comparison to 0.0, I think we should use math.isclose(width, 0.0, abs_tol=1e-9)
There was a problem hiding this comment.
With a not in front of it.
There was a problem hiding this comment.
Hi, @j9ac9k @outofculture
I can't agree with the approach you propose. Not yet, at least.
The check for infinity takes care of the cases when width is not exactly zero, but close to it. I'm afraid, with isclose, the behavior would not be straightforward; it would depend on the tolerance picked essentially at random. Say, if the x span is huge (like 1e308), we might still get the infinity, even if width is significantly non-zero. However, if width is exactly 0.0, the function fails with ZeroDivisionError, and the rest of the function gets ignored. On the other hand, if view_range.right() - view_range.left() (why do we need x0 and x1 at all?) is small (say, equal to the isclose tolerance), the picked tolerance might be far too rough.
There was a problem hiding this comment.
My understanding is that width here wasn't in the view range coordinates, but corresponds to QGraphicsWidget.geometry().width() which should return the QRectF that represents the position and size of the widet, and not the view space.
That said, the tolerance picked effectively at random so you're not wrong there...
There was a problem hiding this comment.
Hmm. Looking further, QRect.width() should be an integer, so I don't think this protection will ever matter either way. We could wrap it in a try/except ZeroDivisionError: pass, even, if that's all we're guarding against, with the added possible bonus of protecting against self.opts['autoDownsampleFactor'] == 0, too. That leaves the question of whether we need to treat e.g. 1 / ((0.3 - 0.2) - 0.1) as a divide-by-zero, which we would normally want to do. In this case, if a application sets self.opts['autoDownsampleFactor'] to e.g. 1e-15, they're going to get bad behavior ("please up-sample my data"?), so we can probably safely ignore this aspect of the problem. Our knee-jerk resistance to comparing floats to specific values likely doesn't matter either way here, but the sentiment is a good one, so it might be worthwhile to put in the isclose just to uphold that principle.
There was a problem hiding this comment.
Hmm. Looking further,
QRect.width()should be an integer, so I don't think this protection will ever matter either way.
QRect.width() should absolutely be an integer, but we're dealing w/ a QRectF here ;)
https://doc-snapshots.qt.io/qt6-dev/qgraphicslayoutitem.html#geometry
We could wrap it in a
try/except ZeroDivisionError: pass, even, if that's all we're guarding against, with the added possible bonus of protecting againstself.opts['autoDownsampleFactor'] == 0, too.
nit-pick level 100; if we do that we should use
import contextlib
...
with contextlib.suppress(ZeroDivisionError):
# do the division hereThat leaves the question of whether we need to treat e.g.
1 / ((0.3 - 0.2) - 0.1)as a divide-by-zero, which we would normally want to do. In this case, if a application setsself.opts['autoDownsampleFactor']to e.g. 1e-15, they're going to get bad behavior ("please up-sample my data"?), so we can probably safely ignore this aspect of the problem. Our knee-jerk resistance to comparing floats to specific values likely doesn't matter either way here, but the sentiment is a good one, so it might be worthwhile to put in theisclosejust to uphold that principle.
Yeah, @StSav012 is modifying existing code here that has some code I absolutely hate, and I'm requesting they change something else that was already in the library; probably not the most fair ask. I'm good w/ leaving it for this PR, and addressing the float equality comparisons in another PR.
|
Hi @StSav012 I chatted w/ @outofculture today, and we agreed on the equality comparison to 0.0 being problematic and settled on a tolerance we felt was appropriate. If youi don't mind making that change, I'll merge this right after 👍🏻 |
Consider the following lines:
```python
x0 = (view_range.left()-finite_x[0]) / dx
x1 = (view_range.right()-finite_x[0]) / dx
```
And later,
```python
ds_float = max(1.0, (x1 - x0) / (width * self.opts['autoDownsampleFactor']))
```
See that `(x1 - x0)` holds the width of `view_range`, divided by `dx`. The values of `x0` and `x1` are not used anywhere else. The values of `x0` and `x1` might be infinite after the division by `dx`, whereas the difference might be finite. So, firstly, I calculate the difference `view_range.right() - view_range.left()`, and then I divide it by `dx`.
The initial code utilizes the direction of the axis implicitly, having `(x1 - x0)` always non-negative. I enclose the expression into `abs` to ensure the same behavior.
|
Closing/Opening to trigger new CI run. |
|
CI all green, merging, sorry for the wait @StSav012 thanks for the PR! |
|
If you have a very large datasets (in my case 10^7 pts) and are navigating into small portions of it using cliptoview and autodownsample, you do not want to recompute ```np.isfinite`` of the whole dataset everytime you change the x range. In my case removing the isfinite check allows a gain of 0.4s when updating the region of my dataset I am looking at. Replacing the two occurences of searchsorted in the whole dataset removes another 0.4s (but only works if x proportional to the index). Those two functions are the main performance problem when zooming in and navigating into very large datasets. Maybe |
The proposed changes enable ignoring the infinite and NaN values of
xand avoiding a division by zero ifself.opts['autoDownsampleFactor'] == 0.