Traditional log10 mode for PlotDataItem (by adding "mapped data" stage)#1992
Traditional log10 mode for PlotDataItem (by adding "mapped data" stage)#1992j9ac9k merged 9 commits intopyqtgraph:masterfrom
Conversation
|
This is what I have been using to test for now (Thank you @pijyoi !) import numpy as np
import pyqtgraph as pg
app = pg.mkQApp()
pw = pg.PlotWidget()
pw.show()
pi = pw.getPlotItem()
# turn log mode on and off:
pi.setLogMode(x=False, y=True)
x = range(-20, 21)
x = np.array(x)
yp = np.array([(10.**p) for p in x])
yp[1] = 0.
yp[2] = np.inf
yp[3] = np.nan
yp[4] = -np.inf
print(yp)
pdi = pg.PlotDataItem(x=x, y=yp, symbol='x', dynamicRangeLimit=1e6)
# No limiter:
# pdi = pg.PlotDataItem(x=x, y=yp, symbol='x', dynamicRangeLimit=1e6)
pi.addItem(pdi)
app.exec() |
| xmax = np.nan | ||
| return QtCore.QRectF( | ||
| QtCore.QPointF(xmin,ymin), | ||
| QtCore.QPointF(xmax,ymax) ) |
There was a problem hiding this comment.
I was just testing this in the interpreter, thinking that constructing a QRectF with QPoints that have np.nan values would blow up, but it seems to work. 👍🏻
|
The following script belongs more to the fuzzing category than to an actual use-case. import numpy as np
import pyqtgraph as pg
app = pg.mkQApp()
pw = pg.PlotWidget()
pw.show()
pw.setLogMode(x=False, y=True)
x = np.arange(-20, 21, dtype=np.float64)
yp = 10**x
x[20] = np.inf # nan is fine, however
pw.plot(x, yp, symbol='x')
app.exec()Note: |
Noticed this as I was going through the code, this is great for supporting different log-bases, sqrt, cubed-root and all sorts of transformations. I know we've talked about having this be stored in ViewBox.py before, but I think this implementation belongs in PlotDataItem, just as you have it 👍🏻
First off, I'm not sure we can even draw a line with a point at +/- inf even if we wanted to ... like, I assume that would be a vertical line from the previous point going either upward or downward, and a second vertical line to the point after... I'm curious what other plotting libraries do. I would rather not get bogged down with feature creep here, if there isn't an obvious solution for drawing lines with points at -inf or inf, I'm all with going w/ @pijyoi 's input:
and move on... |
|
Dear @j9ac9k , I was going to agree with you on not overcomplicating things, but somehow the code went in a different direction... @pijyoi 's fuzzing attack showed something pretty relevant: From this search, we now know if the entire (log-scale mapped) dataset is finite, or if there are NaN values. We can use this information to automatically switch from the fast connection mode "all" to "finite", which can indicate missing points by interruptions in the plotted line. This has bugged me about the previous implementation: The great appeal of having a context menu switch to log scaling is data exploration. But then it doesn't seem like a good idea to silently drop the invalid points with no indication. Keeping track of the search for non-finites required a new property Summary:
Please let me know what you think. I believe these changes should not affect performance in normal, non-log-scale operation, but I am not really set up for testing that... In log mode, there is some additional overhead, but everything should be a lot lighter than the log10 calculation itself. I am open to a less invasive implementation, but I think this adds functionality without making the code a lot messier. Tests also seem to pass :) To do list:
|
| def __init__(self, x, y): | ||
| super().__init__() | ||
| assert x is not None | ||
| assert y is not None |
There was a problem hiding this comment.
asset statements in functional code is generally a bad idea; they are completely ignored in the case of python -O so best to keep them strictly for test code. If we want checks here; probably should raise a RuntimeError.
There was a problem hiding this comment.
These were added to test if any part of the test suite manages to instantiate a PlotDataset with data set to None.
To avoid replicating unneeded code everywhere, the convention throughout PlotDataItem is that a PlotDataset will only be instantiated when there is actual data. Otherwise the reference is kept as None instead.
I think this can simply be removed. I added comments to the code explaining the convention, which should only be relevant to developers working inside PlotDataItem.
| sigPointsClicked = QtCore.Signal(object, object, object) | ||
| sigPointsHovered = QtCore.Signal(object, object, object) | ||
|
|
||
| class PlotDataset(object): |
There was a problem hiding this comment.
I think this might be the first place in the library we a re using nested classes. These are much tougher to test, so unless there is a good reason to test them, we should probably break this out into its own separate class.
Also, if there is a chance we might want this class to emit signals, we should inherit from QObject as well, but since I'm not seeing a signal being emitted, probably no need to modify that.
There was a problem hiding this comment.
--> moved out, and documented as "intended for internal use".
| logmode is a two-element list of Booleans requesting log mode for x and y axis. | ||
| """ | ||
| all_x_finite = False | ||
| if logMode[0]: |
There was a problem hiding this comment.
wonder if we should move this if block into it's own "private" function, since the code is just repeated right below.
There was a problem hiding this comment.
Oops. This moved to PlotDataset rather late, and this copy should no longer be needed.
| # self.yData = None | ||
| self.dataset = None # will hold a PlotDataset for the original data | ||
| self.datasetMapped = None # will hold a PlotDataset for data after mapping transforms (e.g. log scale) | ||
| self.datasetDisp = None # will hold a PlotDataset for data downsampled and limited for display |
There was a problem hiding this comment.
nit-pick level 1001
I'm trying to avoid introducing new variable names that are abbreviated; may I suggest renaming to datasetDisplay ?
Also, these 3 datasets are closely coupled, I keep wondering if they should be members of a dictionary... but I'm not strongly opinionated here.
There was a problem hiding this comment.
--> changed to datasetDisplay.
I was trying not to jump the degree of encapsulation from zero to max in a single step. I think the individual properties are a little easier to track (compared to stuffing them all into one dictionary) while the refactored structures are still somewhat fragile. I think these are not really user-facing, and we should still be able to clean them up later.
--> renamed to "private" namespace as _dataset, _datasetMapped and _datasetDisplay
|
Just did my first pass-through the diff. Looks like you took care to preserve API backwards compatibility 👍🏻 Other issues are mostly involve commented out code, I think there is an unused import, but that's about it. Next up would definitely be to try and break this some more by playing w/ it. |
so... I got a bit curious about this, and realized that in 0.11.0, while it still failed, the scatter plots would display fine if I didn't pan to the right (only panned to the left, or panned to the right up to the original starting point). If I panned further to the right than when I started, they would disappear on me. I git bisected when the scatter points still render when going to the left vs. not, and zero'd in on this commit: While the diff is small here, the diff involves some changes to I'm curious if what broke it was my change to Viewbox.resizeEvent, the change to padding from 0 to 0.0 or the change in how I compute the bounds in EDIT: specifically the bit needed to make the scatter plots remain visible if panning to the left is to add the following to def update(self, *args, **kwargs):
self.prepareForPaint()
GraphicsWidget.update(self, *args, **kwargs) |
|
|
||
|
|
||
|
|
||
| #class TableData: |
There was a problem hiding this comment.
If I read the git log correctly, this has been commented out since 2012.
| assert( (xtest == xgood) or (np.isnan(xtest) and np.isnan(xgood) ) ) | ||
|
|
||
| x = np.array([-np.inf, 0.0, 1.0, 2.0 , np.nan, 4.0 , np.inf]) | ||
| y = np.array([ 1.0, 0.0,-1.0, np.inf, 2.0 , np.nan, 0.0 ]) |
There was a problem hiding this comment.
This should look for crashes with combinations of infs, NaNs and regular values in normal and log mode.
|
|
||
| def setPointMode(self, mode): | ||
| if self.opts['pointMode'] == mode: | ||
| def setPointMode(self, state): |
There was a problem hiding this comment.
Any memory what this is supposed to do?
I could not find any actual function, but PlotItem seems to actually call it.
There was a problem hiding this comment.
Looks like it references a checkbox in PlotItem's ui template in the plotConfigTemplate_<binding>, something involving pointsGroup, supported values are None, True and False.
None pointsGroup.isChecked() and autoPointsCheck.isChecked()
True pointsGroup.isChecked() and not autoPointsCheck.isChecked()
False not pointsGroup.isChecked()
always hated those template files, not sure what uses that particular one.
There was a problem hiding this comment.
I'll try and take a deeper dive into the git history and figure out the source of those, and see if it's something left-behind from some past cleanup effort.
| from .. import getConfigOption | ||
|
|
||
| __all__ = ['PlotDataItem'] | ||
| __all__ = ['PlotDataItem', 'PlotDataset'] |
There was a problem hiding this comment.
If PlotDataset is not used outside of this file, we should probably remove it from __all__ (as __all__ handles the case where you input *)
There was a problem hiding this comment.
This was primarily added to make sphinx find it when coming from the TOC list.
And without the TOC list entry, a sphinx warning ("no TOC entry") failed the CI tests.
--> Flagging the PlotDataset docstring as :orphan: seems to avoid this.
| print( 'good x', x_log) | ||
| print( 'data x', dataset.x) | ||
| print( 'good y', y_log) | ||
| print( 'data y', dataset.y) |
There was a problem hiding this comment.
do we want the test suite printing stuff? (I know it does in other places), often pytest can suppress output, and I often have pytest not suppress output when debugging. Not saying this is a deal breaker; but just want to verify that the extra noise in the console here is worthwhile
There was a problem hiding this comment.
Next time, just say "Hey! Print statements!" (facepalm)
No, these are there because it took me a while to figure out why arrays are never considered equal when there are any np.nan values in there: np.nan != np.nan.
And then I became blind to them and missed stripping them out.
There was a problem hiding this comment.
haha well right now print statements is how we do our logging 😬 could be completely legit reasons they're there.
There was a problem hiding this comment.
--> stripped out
|
I think this now has all the changes I wanted to make at this time. I would now appreciate attempts to break it :) @pijyoi, do you have some time to fuzz it some more? |
|
I tried out the various snippets that had been posted before, and they are well behaved. (no multiple cycles to stabilize, infinites get clipped) I am not sure if #1992 (comment) was intended to be fixed, but I see in the diff that the snippet #1992 (comment) got included. I do consider log10(0) as a valid use case, and it does plot nicely now. import numpy as np
import pyqtgraph as pg
pg.mkQApp()
pw = pg.PlotWidget()
pw.show()
pw.setLogMode(x=False, y=True)
xxx = np.linspace(-3, 3, 601)
yyy = np.sinc(xxx)**2
# quantize
yyy = (yyy * 16384).astype(np.int16).astype(np.float32) / 16384
print(yyy[::100]) # nulls
item = pg.PlotDataItem()
# item = pg.PlotCurveItem()
item.setData(xxx, yyy)
pw.addItem(item)
pg.exec()I tend to use PlotCurveItem more often than PlotDataItem though. |
huh, weird! I just reran this example on the qt 5.15 and qt 6.1 bindings (both pyqt and pyside) and things looked good on zoom/pan (left) on macOS, I'll test out on Windows later. import numpy as np
import pyqtgraph as pg
app = pg.mkQApp()
pw = pg.PlotWidget()
pw.show()
pw.setLogMode(x=False, y=True)
x = np.arange(-20, 21, dtype=np.float64)
yp = 10**x
x[20] = np.inf # nan is fine, however
pw.plot(x, yp, symbol='x')
pg.exec()
Yeah, this is in the "nice to have" category for me, definitely not a primary objective. I think @outofculture said he would try and break this, so I'll give him some time to give it a go before merging. |
|
Sorry, I just copied and pasted the It does not actually improve the behavior on my system: If there is an np.inf value in the x-data, the scatter plot stops drawing entirely when the left-most symbol hits the left edge of the screen. I'll open a separate bug for that. Since it already works when panning to the right, maybe there is a simple, proper fix. But addition of @pijyoi, thank you for the sinc function example, that's a nice example of where zeros might actually come up in a real-world log-plot. The log routine now seems to fail more gracefully than before. (...) I also have the problem of using PlotCurveItem by default; and then I am wondering why the dynamic range issue still breaks my plots when zooming in 😁 |
|
If you export the example in #1992 (comment) to Matplotlib, the Y-axis lists only the exponents. Probably a separate issue. |
|
Yes, that very much seems like a separate issue. How do we actually want this to work? For (A) we would get to use all the work put into MPL for log display. But (B) is easier (possibly more robust) and it would be able to handle any other mapping functions we may implement... |
|
There seems to be a precedent for B. import pyqtgraph as pg
scale = 1e9
plt = pg.plot([0, 1, 2], [(i+1)*scale for i in range(3)])
# set the label so autoSIPrefix works
plt.setLabel('left', 'magnitude')
pg.exec()I didn't even know there was an |
|
Oops, see #1282 for some background on the SI prefix issue |
|
Hmm. Is it possible that this is no longer a good example data set? |
|
So the existing example relied on np.abs() being applied, which the new implementation doesn't do any more. How about changing the example to the following. Can visually see that it's the "expected" plot. y = np.logspace(-6, 6, 13)
x = np.logspace(-6, 6, 13)
p1.plot(x, y, symbol='x')
p2.plot(x, y, symbol='x')
p3.plot(x, y, symbol='x') |
|
How about something completely different? In #1992, I argued that the appeal of an interactive log-mode toggle is identifying properties of unknown data. Would that work? |
examples/logAxis.py
Outdated
| p1.plot(x, y) | ||
| p2.plot(x, y) | ||
| p3.plot(x, y) | ||
| x = np.logspace(-1, 1, 1000) # 1000 points from 0.01 to 100 |
There was a problem hiding this comment.
the comment doesn't match the code
There was a problem hiding this comment.
Awww crap. :)
Thank you for spotting that, I hope the updated version is more correct.
|
Thanks for tackling this @NilsNemitz this was a doozy! Thanks for reviewing @pijyoi this LGTM, merging! |



This attempts to address the problems ( #1988 ) with correct log-scaling mode in PlotDataItem.
The failure mode was that the
dataRect()function describes the bounds of the original, non-log-scale data set. As a result, the dynamic range limiting function of PlotDataItem was working with very wrong assumptions and this resulted in an infinite loop of clipping changes which locked up the entire code.The fix attempted here is to add additional data structures that hold information on the "mapped" data. Right now, "mapped" data just means log-scaled, but I hope this is compatible with a more modular approach in the future.
When log-scale is off, these
xMappedandyMappedarrays should just be an additional reference to the existing data.From the data held in
xMappedanyMapped, a newdataRectMappedis calculated, and the dynamic range limiter seems to work as before when using this data (but see below...).Additional changes:
np.log10(), since I think this is what we want to use for now.functions) which was triggered when working with infinities in the data. We accept Nan as silently missing points here, and I think we want the same for inf.I would appreciate it if someone could try and break this, I am a little short on time currently 😢
This needs a little more cleanup, removing of comments and print statements, and probably some tests, so I am leaving it as a draft for now.
A known "feature" is that normally, the PlotCurveItem lines skip the infinite points. But when the limiter triggers, it that clips these points to a far-away but finite value, and the lines towards "infinity" suddenly become visible. If that requires fixing, it would probably be the best to map all infinities to Nan once. Otherwise we end up searching the entire data set for such values repeatedly.