Skip to content

Traditional log10 mode for PlotDataItem (by adding "mapped data" stage)#1992

Merged
j9ac9k merged 9 commits intopyqtgraph:masterfrom
NilsNemitz:limiter_crash_on_inf
Oct 7, 2021
Merged

Traditional log10 mode for PlotDataItem (by adding "mapped data" stage)#1992
j9ac9k merged 9 commits intopyqtgraph:masterfrom
NilsNemitz:limiter_crash_on_inf

Conversation

@NilsNemitz
Copy link
Copy Markdown
Contributor

@NilsNemitz NilsNemitz commented Sep 15, 2021

This attempts to address the problems ( #1988 ) with correct log-scaling mode in PlotDataItem.

The failure mode was that the dataRect() function describes the bounds of the original, non-log-scale data set. As a result, the dynamic range limiting function of PlotDataItem was working with very wrong assumptions and this resulted in an infinite loop of clipping changes which locked up the entire code.

The fix attempted here is to add additional data structures that hold information on the "mapped" data. Right now, "mapped" data just means log-scaled, but I hope this is compatible with a more modular approach in the future.

When log-scale is off, these xMapped and yMapped arrays should just be an additional reference to the existing data.
From the data held in xMapped an yMapped, a new dataRectMapped is calculated, and the dynamic range limiter seems to work as before when using this data (but see below...).

Additional changes:

  • I changed the log-scale functions to simply be np.log10(), since I think this is what we want to use for now.
  • I silenced a warning in the scaling of data coordinates to screen coordinates (in functions) which was triggered when working with infinities in the data. We accept Nan as silently missing points here, and I think we want the same for inf.
  • The dataRect functions now return the bounds of the plottable data: This is the same as before when the data does not contain any infinities. But it does not return infinite bounds.
  • I removed the "in-place clipping" method from the dynamic range limiter. I was able to repeatedly trigger a sequence that would set the display cache to the original data, and then clip a display buffer that shared memory with the original data. As far as I can tell, that bug was already there before.
  • The clipping operation is now skipped when the view height is zero, which might possibly fix Pyqtgraph internal error handling #1989 . Or not.

I would appreciate it if someone could try and break this, I am a little short on time currently 😢

This needs a little more cleanup, removing of comments and print statements, and probably some tests, so I am leaving it as a draft for now.

A known "feature" is that normally, the PlotCurveItem lines skip the infinite points. But when the limiter triggers, it that clips these points to a far-away but finite value, and the lines towards "infinity" suddenly become visible. If that requires fixing, it would probably be the best to map all infinities to Nan once. Otherwise we end up searching the entire data set for such values repeatedly.

@NilsNemitz NilsNemitz marked this pull request as draft September 15, 2021 17:36
@NilsNemitz
Copy link
Copy Markdown
Contributor Author

This is what I have been using to test for now (Thank you @pijyoi !)

import numpy as np
import pyqtgraph as pg
app = pg.mkQApp()
pw = pg.PlotWidget()
pw.show()
pi = pw.getPlotItem()

# turn log mode on and off:
pi.setLogMode(x=False, y=True)
x = range(-20, 21)
x = np.array(x)
yp = np.array([(10.**p) for p in x])
yp[1] = 0.
yp[2] = np.inf
yp[3] = np.nan
yp[4] = -np.inf

print(yp)

pdi = pg.PlotDataItem(x=x, y=yp, symbol='x', dynamicRangeLimit=1e6)
# No limiter: 
# pdi = pg.PlotDataItem(x=x, y=yp, symbol='x', dynamicRangeLimit=1e6)
pi.addItem(pdi)

app.exec()

xmax = np.nan
return QtCore.QRectF(
QtCore.QPointF(xmin,ymin),
QtCore.QPointF(xmax,ymax) )
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just testing this in the interpreter, thinking that constructing a QRectF with QPoints that have np.nan values would blow up, but it seems to work. 👍🏻

@pijyoi
Copy link
Copy Markdown
Contributor

pijyoi commented Sep 16, 2021

The following script belongs more to the fuzzing category than to an actual use-case.
After the plot appears, try to pan or zoom it. The scatter plots points will get corrupted / disappear.
This effect is present in both master and this PR.

import numpy as np
import pyqtgraph as pg
app = pg.mkQApp()
pw = pg.PlotWidget()
pw.show()

pw.setLogMode(x=False, y=True)
x = np.arange(-20, 21, dtype=np.float64)
yp = 10**x
x[20] = np.inf    # nan is fine, however

pw.plot(x, yp, symbol='x')
app.exec()

Note:
I don't think people have an expectation that non-finites would be handled by a plotting library to begin with.
If such non-finite values are not generated by the library itself as a result of transforming otherwise reasonable inputs, the library need not do anything.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Sep 16, 2021

The fix attempted here is to add additional data structures that hold information on the "mapped" data. Right now, "mapped" data just means log-scaled, but I hope this is compatible with a more modular approach in the future.

Noticed this as I was going through the code, this is great for supporting different log-bases, sqrt, cubed-root and all sorts of transformations. I know we've talked about having this be stored in ViewBox.py before, but I think this implementation belongs in PlotDataItem, just as you have it 👍🏻

A known "feature" is that normally, the PlotCurveItem lines skip the infinite points. But when the limiter triggers, it that clips these points to a far-away but finite value, and the lines towards "infinity" suddenly become visible. If that requires fixing, it would probably be the best to map all infinities to Nan once. Otherwise we end up searching the entire data set for such values repeatedly.

First off, I'm not sure we can even draw a line with a point at +/- inf even if we wanted to ... like, I assume that would be a vertical line from the previous point going either upward or downward, and a second vertical line to the point after... I'm curious what other plotting libraries do.

I would rather not get bogged down with feature creep here, if there isn't an obvious solution for drawing lines with points at -inf or inf, I'm all with going w/ @pijyoi 's input:

I don't think people have an expectation that non-finites would be handled by a plotting library to begin with.

and move on...

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Dear @j9ac9k ,

I was going to agree with you on not overcomplicating things, but somehow the code went in a different direction...

@pijyoi 's fuzzing attack showed something pretty relevant:
The ScatterPlotItem code doesn't seem to handle infinities well. We can basically set a "no infinities" policy, but then we shouldn't violate it ourselves when log scaling is turned on. The code now searches the mapped data and replaces any non-finites with np.nan, which seems to be handled appropriately.

From this search, we now know if the entire (log-scale mapped) dataset is finite, or if there are NaN values. We can use this information to automatically switch from the fast connection mode "all" to "finite", which can indicate missing points by interruptions in the plotted line. This has bugged me about the previous implementation: The great appeal of having a context menu switch to log scaling is data exploration. But then it doesn't seem like a good idea to silently drop the invalid points with no indication.

Keeping track of the search for non-finites required a new property containsNonfinite, that is True/False when the result is know, and None if the search hasn't been done. But with this added, the code started to look way too much like basic or assembler, with a growing number of arbitrary variables scattered all over the name space. To handle this, all the xData, yData, x/yDisp and x/yMapped properties have been moved to a new PlotDataset object that also manages the dataRect data and the new containsNonfinite flag. Compatibility methods and @property decorators make sure that the old interface still works.

Summary:

  • log mode is now simple log10
  • datasets for original data, mapped data, and display data are now handled by three instances of a new PlotDataset object
  • dataset object can keep track of metadata, like the presence of non-finite data.
  • at calculation of log-scaled data, Inf values are replaced by NaN, and information on non-finites is populated from the search.
  • the line connection mode in PlotDataItem now defaults to 'auto': This falls back from 'all' to 'finite' when there are nonfinites.

Please let me know what you think. I believe these changes should not affect performance in normal, non-log-scale operation, but I am not really set up for testing that... In log mode, there is some additional overhead, but everything should be a lot lighter than the log10 calculation itself.

I am open to a less invasive implementation, but I think this adds functionality without making the code a lot messier. Tests also seem to pass :)

To do list:

  • Remove all debug outputs and dead code.
  • Better tests are still needed for non-finite data, and for the modified data structures
  • The 'auto' mode for connection in PlotDataItem needs documentation
  • I lost the suppression of numpy warnings in the actual log10 calculation, that needs to go back in.

def __init__(self, x, y):
super().__init__()
assert x is not None
assert y is not None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asset statements in functional code is generally a bad idea; they are completely ignored in the case of python -O so best to keep them strictly for test code. If we want checks here; probably should raise a RuntimeError.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were added to test if any part of the test suite manages to instantiate a PlotDataset with data set to None.
To avoid replicating unneeded code everywhere, the convention throughout PlotDataItem is that a PlotDataset will only be instantiated when there is actual data. Otherwise the reference is kept as None instead.

I think this can simply be removed. I added comments to the code explaining the convention, which should only be relevant to developers working inside PlotDataItem.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good w/ removing them 👍🏻

sigPointsClicked = QtCore.Signal(object, object, object)
sigPointsHovered = QtCore.Signal(object, object, object)

class PlotDataset(object):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be the first place in the library we a re using nested classes. These are much tougher to test, so unless there is a good reason to test them, we should probably break this out into its own separate class.

Also, if there is a chance we might want this class to emit signals, we should inherit from QObject as well, but since I'm not seeing a signal being emitted, probably no need to modify that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> moved out, and documented as "intended for internal use".

logmode is a two-element list of Booleans requesting log mode for x and y axis.
"""
all_x_finite = False
if logMode[0]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should move this if block into it's own "private" function, since the code is just repeated right below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This moved to PlotDataset rather late, and this copy should no longer be needed.

# self.yData = None
self.dataset = None # will hold a PlotDataset for the original data
self.datasetMapped = None # will hold a PlotDataset for data after mapping transforms (e.g. log scale)
self.datasetDisp = None # will hold a PlotDataset for data downsampled and limited for display
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick level 1001

I'm trying to avoid introducing new variable names that are abbreviated; may I suggest renaming to datasetDisplay ?

Also, these 3 datasets are closely coupled, I keep wondering if they should be members of a dictionary... but I'm not strongly opinionated here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> changed to datasetDisplay.

I was trying not to jump the degree of encapsulation from zero to max in a single step. I think the individual properties are a little easier to track (compared to stuffing them all into one dictionary) while the refactored structures are still somewhat fragile. I think these are not really user-facing, and we should still be able to clean them up later.
--> renamed to "private" namespace as _dataset, _datasetMapped and _datasetDisplay

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Sep 22, 2021

Just did my first pass-through the diff. Looks like you took care to preserve API backwards compatibility 👍🏻

Other issues are mostly involve commented out code, I think there is an unused import, but that's about it.

Next up would definitely be to try and break this some more by playing w/ it.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Sep 22, 2021

The following script belongs more to the fuzzing category than to an actual use-case.
After the plot appears, try to pan or zoom it. The scatter plots points will get corrupted / disappear.
This effect is present in both master and this PR.

so... I got a bit curious about this, and realized that in 0.11.0, while it still failed, the scatter plots would display fine if I didn't pan to the right (only panned to the left, or panned to the right up to the original starting point). If I panned further to the right than when I started, they would disappear on me.

I git bisected when the scatter points still render when going to the left vs. not, and zero'd in on this commit:

71a92a6

While the diff is small here, the diff involves some changes to ViewBox.py which is very much known for "here be dragons" territory.

I'm curious if what broke it was my change to Viewbox.resizeEvent, the change to padding from 0 to 0.0 or the change in how I compute the bounds in mapFromViewToItem

EDIT: specifically the bit needed to make the scatter plots remain visible if panning to the left is to add the following to ViewBox.py

    def update(self, *args, **kwargs):
        self.prepareForPaint()
        GraphicsWidget.update(self, *args, **kwargs)




#class TableData:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the git log correctly, this has been commented out since 2012.

assert( (xtest == xgood) or (np.isnan(xtest) and np.isnan(xgood) ) )

x = np.array([-np.inf, 0.0, 1.0, 2.0 , np.nan, 4.0 , np.inf])
y = np.array([ 1.0, 0.0,-1.0, np.inf, 2.0 , np.nan, 0.0 ])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should look for crashes with combinations of infs, NaNs and regular values in normal and log mode.


def setPointMode(self, mode):
if self.opts['pointMode'] == mode:
def setPointMode(self, state):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any memory what this is supposed to do?
I could not find any actual function, but PlotItem seems to actually call it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it references a checkbox in PlotItem's ui template in the plotConfigTemplate_<binding>, something involving pointsGroup, supported values are None, True and False.

None pointsGroup.isChecked() and autoPointsCheck.isChecked()
True pointsGroup.isChecked() and not autoPointsCheck.isChecked()
False not pointsGroup.isChecked()

always hated those template files, not sure what uses that particular one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and take a deeper dive into the git history and figure out the source of those, and see if it's something left-behind from some past cleanup effort.

@NilsNemitz NilsNemitz marked this pull request as ready for review September 23, 2021 17:33
from .. import getConfigOption

__all__ = ['PlotDataItem']
__all__ = ['PlotDataItem', 'PlotDataset']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PlotDataset is not used outside of this file, we should probably remove it from __all__ (as __all__ handles the case where you input *)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was primarily added to make sphinx find it when coming from the TOC list.
And without the TOC list entry, a sphinx warning ("no TOC entry") failed the CI tests.
--> Flagging the PlotDataset docstring as :orphan: seems to avoid this.

print( 'good x', x_log)
print( 'data x', dataset.x)
print( 'good y', y_log)
print( 'data y', dataset.y)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want the test suite printing stuff? (I know it does in other places), often pytest can suppress output, and I often have pytest not suppress output when debugging. Not saying this is a deal breaker; but just want to verify that the extra noise in the console here is worthwhile

Copy link
Copy Markdown
Contributor Author

@NilsNemitz NilsNemitz Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next time, just say "Hey! Print statements!" (facepalm)

No, these are there because it took me a while to figure out why arrays are never considered equal when there are any np.nan values in there: np.nan != np.nan.
And then I became blind to them and missed stripping them out.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha well right now print statements is how we do our logging 😬 could be completely legit reasons they're there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> stripped out

@NilsNemitz NilsNemitz changed the title make PlotDataItem aware of mapped data Traditional log10 mode for PlotDataItem (by adding "mapped data" stage) Sep 25, 2021
@NilsNemitz
Copy link
Copy Markdown
Contributor Author

I think this now has all the changes I wanted to make at this time. I would now appreciate attempts to break it :)

@pijyoi, do you have some time to fuzz it some more?

@pijyoi
Copy link
Copy Markdown
Contributor

pijyoi commented Sep 25, 2021

I tried out the various snippets that had been posted before, and they are well behaved. (no multiple cycles to stabilize, infinites get clipped)

I am not sure if #1992 (comment) was intended to be fixed, but I see in the diff that the snippet #1992 (comment) got included.
However, I still see corruption when panning left and when zooming.
Note: I had no expectation that there was anything to fix for that script.

I do consider log10(0) as a valid use case, and it does plot nicely now.

import numpy as np
import pyqtgraph as pg
pg.mkQApp()
pw = pg.PlotWidget()
pw.show()

pw.setLogMode(x=False, y=True)

xxx = np.linspace(-3, 3, 601)
yyy = np.sinc(xxx)**2

# quantize
yyy = (yyy * 16384).astype(np.int16).astype(np.float32) / 16384
print(yyy[::100])  # nulls

item = pg.PlotDataItem()
# item = pg.PlotCurveItem()

item.setData(xxx, yyy)
pw.addItem(item)
pg.exec()

I tend to use PlotCurveItem more often than PlotDataItem though.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Sep 25, 2021

I am not sure if #1992 (comment) was intended to be fixed, but I see in the diff that the snippet #1992 (comment) got included.
However, I still see corruption when panning left and when zooming.
Note: I had no expectation that there was anything to fix for that script.

huh, weird! I just reran this example on the qt 5.15 and qt 6.1 bindings (both pyqt and pyside) and things looked good on zoom/pan (left) on macOS, I'll test out on Windows later.

import numpy as np
import pyqtgraph as pg
app = pg.mkQApp()
pw = pg.PlotWidget()
pw.show()

pw.setLogMode(x=False, y=True)
x = np.arange(-20, 21, dtype=np.float64)
yp = 10**x
x[20] = np.inf    # nan is fine, however

pw.plot(x, yp, symbol='x')
pg.exec()

Note: I had no expectation that there was anything to fix for that script.

Yeah, this is in the "nice to have" category for me, definitely not a primary objective.

I think @outofculture said he would try and break this, so I'll give him some time to give it a go before merging.

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Sorry, I just copied and pasted the update method override, and only checked that it didn't cause any additional errors. :)

It does not actually improve the behavior on my system: If there is an np.inf value in the x-data, the scatter plot stops drawing entirely when the left-most symbol hits the left edge of the screen.

sys.platform: win32
sys.version: 3.9.7 | packaged by conda-forge | (default, Sep 14 2021, 01:11:01) [MSC v.1916 64 bit (AMD64)]
qt bindings: PyQt5 5.12.3 Qt 5.12.9

I'll open a separate bug for that. Since it already works when panning to the right, maybe there is a simple, proper fix.

But addition of prepareForPaint seems to fix this on OSX, I would propose keeping that in for now, unless there is a performance concern.

@pijyoi, thank you for the sinc function example, that's a nice example of where zeros might actually come up in a real-world log-plot. The log routine now seems to fail more gracefully than before. (...) I also have the problem of using PlotCurveItem by default; and then I am wondering why the dynamic range issue still breaks my plots when zooming in 😁

@pijyoi
Copy link
Copy Markdown
Contributor

pijyoi commented Sep 27, 2021

If you export the example in #1992 (comment) to Matplotlib, the Y-axis lists only the exponents. Probably a separate issue.
image

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Yes, that very much seems like a separate issue.
It would seem that the exporter is not aware of special labeling that happens to the axes in log mode.

How do we actually want this to work?
A) Do we special-case export in log mode to read the raw data from PlotDataItem, send that to the MPL code, and activate the log mode there?
B) Or do we send the log10 values (as seems to happen now) and adjust the MPL axis ticks so that it displays the same as in pyqtgraph?

For (A) we would get to use all the work put into MPL for log display. But (B) is easier (possibly more robust) and it would be able to handle any other mapping functions we may implement...

@pijyoi
Copy link
Copy Markdown
Contributor

pijyoi commented Sep 29, 2021

There seems to be a precedent for B.
from tests/exporters/test_matplotlib.py:

import pyqtgraph as pg
scale = 1e9
plt = pg.plot([0, 1, 2], [(i+1)*scale for i in range(3)])
# set the label so autoSIPrefix works
plt.setLabel('left', 'magnitude')
pg.exec()

image2

I didn't even know there was an autoSIPrefix feature.

@ixjlyons
Copy link
Copy Markdown
Member

Oops, see #1282 for some background on the SI prefix issue

@pijyoi
Copy link
Copy Markdown
Contributor

pijyoi commented Sep 30, 2021

Something looks broken with examples/logAxis.py
Capture

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Hmm. Is it possible that this is no longer a good example data set?
The top plot looks like the data is random over the range of +/-2. That wouldn't work really well with a log transform...

@pijyoi
Copy link
Copy Markdown
Contributor

pijyoi commented Sep 30, 2021

So the existing example relied on np.abs() being applied, which the new implementation doesn't do any more.
EDIT: The existing example was there since 2012, when the transform was just plain old np.log10(). np.abs() was added in #1476

How about changing the example to the following. Can visually see that it's the "expected" plot.

y = np.logspace(-6, 6, 13)
x = np.logspace(-6, 6, 13)
p1.plot(x, y, symbol='x')
p2.plot(x, y, symbol='x')
p3.plot(x, y, symbol='x')

Capture3

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

NilsNemitz commented Oct 4, 2021

How about something completely different?

In #1992, I argued that the appeal of an interactive log-mode toggle is identifying properties of unknown data.
I have replaced the log plot example with one based on that argument, showing a number of plots that have somewhat similar character in a linear plot, but are easily identified by their shape in the log or semi-log plots.

Log axis example

Would that work?

p1.plot(x, y)
p2.plot(x, y)
p3.plot(x, y)
x = np.logspace(-1, 1, 1000) # 1000 points from 0.01 to 100
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment doesn't match the code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awww crap. :)
Thank you for spotting that, I hope the updated version is more correct.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Oct 7, 2021

Thanks for tackling this @NilsNemitz this was a doozy! Thanks for reviewing @pijyoi this LGTM, merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pyqtgraph internal error handling

4 participants