Conversation
|
Hey @NilsNemitz you're on fire with these; it's going to be a little bit before I can test it out. May I request that you add a Thanks again. Also if you want to collaborate with some of what the other folks are working on, feel free to hop on our slack workspace. |
|
Dear @j9ac9k , no worries. I just stumbled across this issue, and it seemed to require a change to the exact same code I was looking at. I might go look for your slack when there is time, but my scheduling is a bit unpredictable, so I don't know if I would be much help with ongoing projects. But if you have other issues with PlotDataItem or adjacent code, feel free to point me at something. I will look into adding a direct call tomorrow. Unfortunately I have no experience with python2 and no working installation to test with. Let's see how that goes. |
|
The py2 development requirement is a bit of a pain to setup, so don't worry if you can't replicate locally, just have a test that covers that case and the CI should blow up if it's wrong. I can assist with nailing down the appropriate syntax. |
|
Well, this seems to work, except for "test_exit_crash.py" on macOS. Any parameters passed to pdi.appendData() are passed through to setData(), setting append=True. Do we need to support that for named 'x' and 'y' arguments, too? |
| self.xData = x # set to new data | ||
| self.yData = y | ||
| elif y is not None: # append; note that x cannot be None if y is given | ||
| self.xData = np.append(self.xData, x) |
There was a problem hiding this comment.
Hmm. np.append() is traditionally known to be very slow (due to additional copying and re-allocation of mem) so I'm wondering if you've actually noticed any improvements with this on fast updates @NilsNemitz?
Here's some resources to back that claim up:
- https://stackoverflow.com/a/25649863
- https://stackoverflow.com/a/46103554
- https://numpy.org/doc/stable/reference/generated/numpy.append.html#numpy.append
I would somewhat anticipate a pre-allocated large(r) array that can be passed in and instead the "appending" is done through creating new graphics using QPainterPath.addPath() and a "diff" with whatever the old rendered path was and the new input array. This will likely require a manual call to pyqtgraph.functions.arrayToQpath() to do the updating from new data and generate the new path segments.
To me, if the data is being appended (or updated) that should be up the client code using pyqtgraph and PlotCurveItem should be in charge of just figuring out what new graphics need to be generated and displayed.
I'll follow up with a more in depth commentary and examples hopefully by tonight.
There was a problem hiding this comment.
Hi @goodboy! Nice to see somebody digging into the code to work on increased performance.
This PR just aims to implement what was attempted in earlier PR #850 , with maybe slightly less redundant code.
The envisioned use case is an extremely simple, Labview-like code path for extremely simple projects like periodically reading a voltage from a multimeter and adding it to an existing graph. Since data can be saved from the context menu later, this seemed useful enough to throw some minimally invasive lines of code at it.
I am fully aware that the solution here is slow. I agree that adding a small subset of data can in principle be faster than replacing the data altogether, but getting that to work without breaking other functionality (which expects numpy arrays of data in xData and yData) was a larger project than I felt ready to tackle.
I will be happy to see a high performance solution to replace this. Even happier if it plays nicely with the PlotDataItem data mangling (e.g. PR #1140), which seems needed to get Qt to draw some of my data sets properly. :)
I consider manual messing with QPainterPath outside my area of expertise. But I am happy to hear your suggestions on this PR, or to contribute to your more in-depth effort.
|
@NilsNemitz yeah sorry for the drive-by code critique; i just so happen to be after the same outcomes as well I think 😼
👍🏼 you might want to check out #1418 which was the first place I was able to get a massive speedup (and a lot better resource utilization) as well 🏄🏼
Gtk. I haven't read this PR (yet) but this use case seems to be the same as mine: updating a line or bar chart with incoming real-time (sampled) data.
I will try to take a look at this PR as well. I will say a lot of the internal data setting and getting before rendering to graphics (
Indeed, I spent some decent time learning the magic of this lower level stuff (particularly the strange binary path streaming protocol that doesn't seem to be documented anywhere?) which seems to give My main, quiff, with this change is I'm not sure it's really doing anything in terms of "appending" that is graphics related and at the same time it's inverting the paradigm where client code should write data for the graphics lib to read (and then generate graphics from that data). This change is instead expecting graphics code to maintain internal state of the user's input between updates. For example here's the main "appending" example showing real-time data updates using You'll notice that there's a bunch of ways demonstrating how to update plots but in each case the client code is in full control of managing the input In other words, wrt to this PR, what's stopping your code from just doing Hopefully this wasn't too long winded and and comes across clearly 🧑🎓 I do propose that a "better" and more performant way to address this need for a fast graphics update is to create a way to generate Let me know what y'all think! |
|
I've been thinking on the performance bit; @goodboy is right, if you look at how There are some alternatives to |
|
Thank you, particularly this bit was quite helpful: When appending data, the updated code now allocates a buffer that is 1.5x larger than needed. This buffer contains
yData is then assigned to be a view of the elements that contain original and appended data. (Same for xData) The next time data is appended, it can be directly placed in the buffer (a limited copy operation), and only the view needs to be updated (zero copying?). A new allocation of memory and full copy of all data is only done when the buffer is full. At that time, a new buffer is allocated, once again at 150% of the currently needed size. For the expected use of repeatedly adding a small number of elements, this should be dramatically faster than calling Occasionally, appending data will force a buffer re-allocation, but this becomes more rare as the dataset grows. |
|
I guess I'll do one last rant then head off to the woods 😄
Not sure what you're asserting. Obviously if you're going to do downsampling of
Why would you not have this?
Unless your downsampling algo requires a large look-back (which afaict none of them do) I don't see how you can claim this. Frankly, there's quite a lot of code inside these graphics types that intertwines data processing and graphics rendering machinery unnecessarily and it makes customizing/speeding up things pretty difficult (so much so you might as well just re implement the entire Yeah, so I'll just say it one last time then probably peace out from this discussion 😂 but, I really don't think sticking "append logic" inside Why:
As a final note, I think the main issue with almost all of these "graphics" apis in
wat. |
|
This is a lengthy reply, and I'm watching my kids this weekend so I can't address all the points as I'm being interrupted here every 10 seconds, so apologies if some sentencing looks broken...
...but it's not... downsample logic is here https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph/graphicsItems/PlotDataItem.py#L665-L682 but the rendering of the QPainterPath is done here: https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph/functions.py#L1485 The logic is indeed spread out, we have a method that generates a QPainterPath from an array that's provided.
This works right up until you zoom in, when you zoom in when you previously had downsampled, and need to re-calculate the entire QPainterPath, how would that work if the underlying data wasn't one continuous array?
Subclassing is the Qt way! Ask me about
As we're looking at some of the considerations/consequences here I admit I too am starting to question the utility; You did point out the
With respect to data processing, downsampling and drawing, I do feel like they are decently separated. pyqtgraph provides some convenience methods for common operations/transformations such as the fourier transformation. I don't really understand what you mean with the rest of the paragraph, maybe you can provide an example, but if it's not relevant to this PR, but submit a draft PR where you take a portion of PlotDataItem and highlight what you mean better; it's hard to talk about these kind of concepts regarding API design without an example, or some kind of mach-up, but an existing PR is likely not the right place to introduce that. |
I guess you didn't click on any of my links 😿 I think we're getting lost 😂 You originally said:
This is a false dilemma: you can keep a reference to your continuous array and only update parts of the path that need to be based on zoom level and the current state of the continuous array reference. You can also cache I don't understand why we will have a other stuff will break concern here. Reread my original comment.
I don't understand where you're getting this idea that you would not have one continuous array.
But not often the python way these days 😎 For example, look at the way
I have to be honest, I can't really think of a reason to need to use I was originally suggesting:
The original feature request that drove this PR seems to me to be better solved by documenting the example recipes. Anyway will jump in chat for further banter. |
|
As we continue to work out the distinction between On the In the context of quick exploratory data evaluation that might be a not-insignificant benefit; especially if we can improve integration with environments like Jupyter. Not really related, but I have also added a check for string content of the 'connect' parameter; this should close #2027. |
| if idx == 2: break # only convert the first two arguments | ||
| try: # if it converts to float, then convert to single-value ndarray | ||
| arg_list[idx] = np.full(1, float(arg_list[idx])) | ||
| except TypeError: pass # TypeError occurs for data that does not need conversion |
There was a problem hiding this comment.
We can clean this up a decent amount I think!
# at the top of the file
from contextlib import suppress
....
def appendData(self, *args, **kwargs):
arg_list = list(args)
for idx, arg in enumerate(args):
if idx == 2:
break
with suppress(TypeError):
# TypeError occurs for data that does not need conversion
arg_list[idx] = np.full(1, float(arg))
kargs['append'] = True
self.setData(*arg_list, **kargs)for _ in range(len(iterable)) is generally considered a code-smell, if you care about indexes you should use enumerate(iterable). Also whenever you have try/except: pass, the suppress is typically what you want; functionally they're the same thing, but it's just a bit more readable ...but now we're in nit-pick territory.
|
Hi @NilsNemitz Thanks for walking me through this! Only suggestion I have is to modify the |
| self.containsNonfinite = not (all_x_are_finite and all_y_are_finite) # This always yields a definite True/False answer | ||
| self._dataRect = QtCore.QRectF( QtCore.QPointF(xmin,ymin), QtCore.QPointF(xmax,ymax) ) | ||
|
|
||
| def append(self, x, y): |
There was a problem hiding this comment.
if we might be concerned regarding the performance here, I would consider adding the following bit as the first line of code in this function.
def append(self, x, y):
"""
docstring
"""
profiler = pg.debug.Profiler() # add this line
old_length = len(self.y)
.....Having a profiler in a particular function is usually a decent indication that we're concerned about the performance here, and makes it simpler to modify the line so it actually spits output on the console if we decide to investigate fruther.
Since I have PlotDataItem on my metaphorical workbench right now:
This PR implements the option to append new data requested in #1430 by calling PlotDataItem.setData() with "append=True".
Including the functionality in setData() avoids the code duplication of #850 . If we do not expect this to be heavily used, then adding a separate appendData() call might not be necessary. The syntax
pdi.setData( x1, y1 )pdi.setData( x2, y2, append=True )seems simple enough for occasional use.
The code should handle append both to and with an empty dataset. If no x data is given, the last x value is incremented by 1 for each y element. That should give the expected result in the obvious use cases, and behaves in a predictable (although probably not very useful) manner when appending to a dataset with defined x values.
I unravelled the logic a little to keep it readable after including the append functionality. This should typically introduce only a single additional check (
if x is Nonein L531). I also removed some already commented out code:# self.clear() --> would now conflict directly with "append" functionality
# self.informViewBoundsChanged() --> This code seems to have successfully moved to GraphicsItem.informViewBoundsChanged().
In both cases it seems unlikely that the functions would need to be restored.
I added a docstring for "append=True" (not sure if that works) and some tests.
Please let me know what is missing!
closes #1430