Skip to content

Make auto downsample factor calculation more robust#2253

Merged
j9ac9k merged 4 commits intopyqtgraph:masterfrom
StSav012:patch-8
Aug 28, 2022
Merged

Make auto downsample factor calculation more robust#2253
j9ac9k merged 4 commits intopyqtgraph:masterfrom
StSav012:patch-8

Conversation

@StSav012
Copy link
Copy Markdown
Contributor

The proposed changes enable ignoring the infinite and NaN values of x and avoiding a division by zero if self.opts['autoDownsampleFactor'] == 0.

The proposed changes enable ignoring the infinite and NaN values of `x` and avoiding a division by zero if `self.opts['autoDownsampleFactor'] == 0`.
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Apr 20, 2022

Hi @StSav012

Thanks for the PR, this will definitely make an improvement. I'm a little concerned with the equality check at the bottom of your diff

if width*self.opts['autoDownsampleFactor'] != 0.0 and np.isfinite(x1-x0):

I know the code you're replacing was if width != 0.0 but I'm wondering if you think using something like not math.isclose() would be more appropriate here. I'm ok with merging as is, but wanted to get your thoughts on the equality check first.

@StSav012
Copy link
Copy Markdown
Contributor Author

StSav012 commented Apr 20, 2022

Hi @j9ac9k

It's a good point. I'm not confident how to tell a sufficient tolerance from a bad one. After your question, I'd rather test that (x1-x0) / (width*self.opts['autoDownsampleFactor']) is still finite. This way there is no need for any tolerance. Kind of like this:

                    if width * self.opts['autoDownsampleFactor'] != 0.0:
                        ds_float = (x1 - x0) / (width * self.opts['autoDownsampleFactor'])
                        if np.isfinite(ds_float):
                            ds = int(max(1, int(ds_float)))

And do we require both ints at the line above?

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Apr 20, 2022

width should always be of type float, and autoDownsampleFactor can be float or integer. autoDownsampleFactor should be > 1.0, so we shouldn't run into any weirdness, so this is likely a non-issue.

Also for singular values, I generally prefer to use math.isfinite vs. numpy.isfinite but there was an issue some time ago where it turns out the two behave differently (I should try and search the issue/pr tracker for more specifics).

width = self.getViewBox().width()
if width != 0.0:
ds = int(max(1, int((x1-x0) / (width*self.opts['autoDownsampleFactor']))))
if width != 0.0: # autoDownsampleFactor _should_ be > 1.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @StSav012

I think for this equality comparison to 0.0, I think we should use math.isclose(width, 0.0, abs_tol=1e-9)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a not in front of it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @j9ac9k @outofculture

I can't agree with the approach you propose. Not yet, at least.

The check for infinity takes care of the cases when width is not exactly zero, but close to it. I'm afraid, with isclose, the behavior would not be straightforward; it would depend on the tolerance picked essentially at random. Say, if the x span is huge (like 1e308), we might still get the infinity, even if width is significantly non-zero. However, if width is exactly 0.0, the function fails with ZeroDivisionError, and the rest of the function gets ignored. On the other hand, if view_range.right() - view_range.left() (why do we need x0 and x1 at all?) is small (say, equal to the isclose tolerance), the picked tolerance might be far too rough.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that width here wasn't in the view range coordinates, but corresponds to QGraphicsWidget.geometry().width() which should return the QRectF that represents the position and size of the widet, and not the view space.

That said, the tolerance picked effectively at random so you're not wrong there...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Looking further, QRect.width() should be an integer, so I don't think this protection will ever matter either way. We could wrap it in a try/except ZeroDivisionError: pass, even, if that's all we're guarding against, with the added possible bonus of protecting against self.opts['autoDownsampleFactor'] == 0, too. That leaves the question of whether we need to treat e.g. 1 / ((0.3 - 0.2) - 0.1) as a divide-by-zero, which we would normally want to do. In this case, if a application sets self.opts['autoDownsampleFactor'] to e.g. 1e-15, they're going to get bad behavior ("please up-sample my data"?), so we can probably safely ignore this aspect of the problem. Our knee-jerk resistance to comparing floats to specific values likely doesn't matter either way here, but the sentiment is a good one, so it might be worthwhile to put in the isclose just to uphold that principle.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Looking further, QRect.width() should be an integer, so I don't think this protection will ever matter either way.

QRect.width() should absolutely be an integer, but we're dealing w/ a QRectF here ;)

https://doc-snapshots.qt.io/qt6-dev/qgraphicslayoutitem.html#geometry

We could wrap it in a try/except ZeroDivisionError: pass, even, if that's all we're guarding against, with the added possible bonus of protecting against self.opts['autoDownsampleFactor'] == 0, too.

nit-pick level 100; if we do that we should use

import contextlib

...

with contextlib.suppress(ZeroDivisionError):
    # do the division here

That leaves the question of whether we need to treat e.g. 1 / ((0.3 - 0.2) - 0.1) as a divide-by-zero, which we would normally want to do. In this case, if a application sets self.opts['autoDownsampleFactor'] to e.g. 1e-15, they're going to get bad behavior ("please up-sample my data"?), so we can probably safely ignore this aspect of the problem. Our knee-jerk resistance to comparing floats to specific values likely doesn't matter either way here, but the sentiment is a good one, so it might be worthwhile to put in the isclose just to uphold that principle.

Yeah, @StSav012 is modifying existing code here that has some code I absolutely hate, and I'm requesting they change something else that was already in the library; probably not the most fair ask. I'm good w/ leaving it for this PR, and addressing the float equality comparisons in another PR.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented May 2, 2022

Hi @StSav012

I chatted w/ @outofculture today, and we agreed on the equality comparison to 0.0 being problematic and settled on a tolerance we felt was appropriate. If youi don't mind making that change, I'll merge this right after 👍🏻

Consider the following lines:

```python
                    x0 = (view_range.left()-finite_x[0]) / dx
                    x1 = (view_range.right()-finite_x[0]) / dx
```

And later,

```python
                        ds_float = max(1.0, (x1 - x0) / (width * self.opts['autoDownsampleFactor']))
```

See that `(x1 - x0)` holds the width of `view_range`, divided by `dx`. The values of `x0` and `x1` are not used anywhere else. The values of `x0` and `x1` might be infinite after the division by `dx`, whereas the difference might be finite. So, firstly, I calculate the difference `view_range.right() - view_range.left()`, and then I divide it by `dx`.

The initial code utilizes the direction of the axis implicitly, having `(x1 - x0)` always non-negative. I enclose the expression into `abs` to ensure the same behavior.
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Aug 28, 2022

Closing/Opening to trigger new CI run.

@j9ac9k j9ac9k closed this Aug 28, 2022
@j9ac9k j9ac9k reopened this Aug 28, 2022
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Aug 28, 2022

CI all green, merging, sorry for the wait @StSav012 thanks for the PR!

@j9ac9k j9ac9k merged commit befa5de into pyqtgraph:master Aug 28, 2022
@Mriv31
Copy link
Copy Markdown

Mriv31 commented May 3, 2023

np.isfinite is actually extremely slow and is needlessly called each time we recompute the ds parameter (ie update the viewrange).

If you have a very large datasets (in my case 10^7 pts) and are navigating into small portions of it using cliptoview and autodownsample, you do not want to recompute ```np.isfinite`` of the whole dataset everytime you change the x range.

In my case removing the isfinite check allows a gain of 0.4s when updating the region of my dataset I am looking at. Replacing the two occurences of searchsorted in the whole dataset removes another 0.4s (but only works if x proportional to the index).

Those two functions are the main performance problem when zooming in and navigating into very large datasets.

Maybe isfinite be checked only once for a given dataset - upon loading ? and only finite values kept for what happens next. What do you thing ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants