Skip to content

Append data to PlotDataItem#1453

Open
NilsNemitz wants to merge 17 commits intopyqtgraph:masterfrom
NilsNemitz:append
Open

Append data to PlotDataItem#1453
NilsNemitz wants to merge 17 commits intopyqtgraph:masterfrom
NilsNemitz:append

Conversation

@NilsNemitz
Copy link
Copy Markdown
Contributor

Since I have PlotDataItem on my metaphorical workbench right now:

This PR implements the option to append new data requested in #1430 by calling PlotDataItem.setData() with "append=True".
Including the functionality in setData() avoids the code duplication of #850 . If we do not expect this to be heavily used, then adding a separate appendData() call might not be necessary. The syntax
pdi.setData( x1, y1 )
pdi.setData( x2, y2, append=True )
seems simple enough for occasional use.

The code should handle append both to and with an empty dataset. If no x data is given, the last x value is incremented by 1 for each y element. That should give the expected result in the obvious use cases, and behaves in a predictable (although probably not very useful) manner when appending to a dataset with defined x values.

I unravelled the logic a little to keep it readable after including the append functionality. This should typically introduce only a single additional check (if x is None in L531). I also removed some already commented out code:
# self.clear() --> would now conflict directly with "append" functionality
# self.informViewBoundsChanged() --> This code seems to have successfully moved to GraphicsItem.informViewBoundsChanged().
In both cases it seems unlikely that the functions would need to be restored.

I added a docstring for "append=True" (not sure if that works) and some tests.
Please let me know what is missing!

closes #1430

@NilsNemitz NilsNemitz changed the title Append Append data to PlotDataItem Nov 21, 2020
@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Nov 21, 2020

Hey @NilsNemitz you're on fire with these; it's going to be a little bit before I can test it out.

May I request that you add a PlotDataItem.appendData() method that just calls PlotDataItem.setData(*args, **kwargs, append=True) ? (that exact syntax may be problematic due to python2 compatibility).

Thanks again. Also if you want to collaborate with some of what the other folks are working on, feel free to hop on our slack workspace.

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Dear @j9ac9k , no worries. I just stumbled across this issue, and it seemed to require a change to the exact same code I was looking at. I might go look for your slack when there is time, but my scheduling is a bit unpredictable, so I don't know if I would be much help with ongoing projects. But if you have other issues with PlotDataItem or adjacent code, feel free to point me at something.

I will look into adding a direct call tomorrow. Unfortunately I have no experience with python2 and no working installation to test with. Let's see how that goes.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Nov 21, 2020

The py2 development requirement is a bit of a pain to setup, so don't worry if you can't replicate locally, just have a test that covers that case and the CI should blow up if it's wrong. I can assist with nailing down the appropriate syntax.

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Well, this seems to work, except for "test_exit_crash.py" on macOS.
@j9ac9k , feel free to let me know if you see a way to make this more elegant.

Any parameters passed to pdi.appendData() are passed through to setData(), setting append=True.
I would expect that an important use case for adding data piece-by-piece is to actually add data point-py-point.
For this purpose, appendData will try to detect numerical x and y values and convert them to the ndarrays that setData() expects.

Do we need to support that for named 'x' and 'y' arguments, too?

self.xData = x # set to new data
self.yData = y
elif y is not None: # append; note that x cannot be None if y is given
self.xData = np.append(self.xData, x)
Copy link
Copy Markdown

@goodboy goodboy Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. np.append() is traditionally known to be very slow (due to additional copying and re-allocation of mem) so I'm wondering if you've actually noticed any improvements with this on fast updates @NilsNemitz?

Here's some resources to back that claim up:

I would somewhat anticipate a pre-allocated large(r) array that can be passed in and instead the "appending" is done through creating new graphics using QPainterPath.addPath() and a "diff" with whatever the old rendered path was and the new input array. This will likely require a manual call to pyqtgraph.functions.arrayToQpath() to do the updating from new data and generate the new path segments.

To me, if the data is being appended (or updated) that should be up the client code using pyqtgraph and PlotCurveItem should be in charge of just figuring out what new graphics need to be generated and displayed.

I'll follow up with a more in depth commentary and examples hopefully by tonight.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @goodboy! Nice to see somebody digging into the code to work on increased performance.

This PR just aims to implement what was attempted in earlier PR #850 , with maybe slightly less redundant code.
The envisioned use case is an extremely simple, Labview-like code path for extremely simple projects like periodically reading a voltage from a multimeter and adding it to an existing graph. Since data can be saved from the context menu later, this seemed useful enough to throw some minimally invasive lines of code at it.

I am fully aware that the solution here is slow. I agree that adding a small subset of data can in principle be faster than replacing the data altogether, but getting that to work without breaking other functionality (which expects numpy arrays of data in xData and yData) was a larger project than I felt ready to tackle.

I will be happy to see a high performance solution to replace this. Even happier if it plays nicely with the PlotDataItem data mangling (e.g. PR #1140), which seems needed to get Qt to draw some of my data sets properly. :)

I consider manual messing with QPainterPath outside my area of expertise. But I am happy to hear your suggestions on this PR, or to contribute to your more in-depth effort.

@goodboy
Copy link
Copy Markdown

goodboy commented Dec 22, 2020

@NilsNemitz yeah sorry for the drive-by code critique; i just so happen to be after the same outcomes as well I think 😼

Hi @goodboy! Nice to see somebody digging into the code to work on increased performance.

👍🏼 you might want to check out #1418 which was the first place I was able to get a massive speedup (and a lot better resource utilization) as well 🏄🏼

This PR just aims to implement what was attempted in earlier PR #850 , with maybe slightly less redundant code.
The envisioned use case is an extremely simple, Labview-like code path for extremely simple projects like periodically reading a voltage from a multimeter and adding it to an existing graph.

Gtk. I haven't read this PR (yet) but this use case seems to be the same as mine: updating a line or bar chart with incoming real-time (sampled) data.

I will be happy to see a high performance solution to replace this. Even happier if it plays nicely with the PlotDataItem data mangling (e.g. PR #1140), which seems needed to get Qt to draw some of my data sets properly. :)

I will try to take a look at this PR as well. I will say a lot of the internal data setting and getting before rendering to graphics (QPainterPath) seems a bit superfluous and could be somewhat simplified if we are careful - especially if there was a more standard caching system in place.

I consider manual messing with QPainterPath outside my area of expertise. But I am happy to hear your suggestions on this PR, or to contribute to your more in-depth effort.

Indeed, I spent some decent time learning the magic of this lower level stuff (particularly the strange binary path streaming protocol that doesn't seem to be documented anywhere?) which seems to give PlotCurveItem the highest performance out of all line graphics in pyqtgraph . In particular I was able to actually get OHLC series drawn about 10x faster using this same technique, plus a little numba. I was also able to get "proper" path prepend and append working as well for OHLC series - it's extremely performant. In fact it's actually PlotCurveItem that is slowing down my app's interactivity now likely due to the lack of a "real" path append in the internals here.

My main, quiff, with this change is I'm not sure it's really doing anything in terms of "appending" that is graphics related and at the same time it's inverting the paradigm where client code should write data for the graphics lib to read (and then generate graphics from that data). This change is instead expecting graphics code to maintain internal state of the user's input between updates.

For example here's the main "appending" example showing real-time data updates using PlotDataItem:
https://github.com/pyqtgraph/pyqtgraph/blob/master/examples/scrollingPlots.py

You'll notice that there's a bunch of ways demonstrating how to update plots but in each case the client code is in full control of managing the input numpy array that will be read by pyqtgraph to generate graphics (in this case deep underneath is QPainterPath). Each append is done in the update function and there's no need to maintain data state inside .setData().

In other words, wrt to this PR, what's stopping your code from just doing .setData(np.append(old_array, new_array)) on each update step? Afaict there should be no need to stick the np.append() inside .setData(). Also, either way you end up with a result that is the larger concatenated ouput array from np.append() in memory; it might as well be in your client code where you have full control over managing that data.

Hopefully this wasn't too long winded and and comes across clearly 🧑‍🎓

I do propose that a "better" and more performant way to address this need for a fast graphics update is to create a way to generate QPainterPath objects that can actually be appended (or prepended) to existing paths in view based on data diffing with the existing path. For example QPainterPath has a .boundingRect() which can be used to determine x-axis max/min data to determine if new paths need to be generated from the input data versus what was last drawn.

Let me know what y'all think!

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Feb 5, 2021

I've been thinking on the performance bit; @goodboy is right, np.append() is slow. The other factor is that the PlotDataItem instance needs to have a reference/view to the underlying data, as a single array. This need is for many reasons, but the best example that I can think of is downsampling/resampling. Because we need a continuous array of the underlying data, appendData=True can't just generate a QPainterPath and append it to the existing one, and call it good, while it would be fast, all sorts of other stuff will break.

if you look at how PlotDataItem.setData() works right now, if you pass in a numpy array, it never actually copies the data, it just adds a reference. This means, if you're ever looking to append to a plot with new data that wasn't there when you generated it, at some point you will be calling np.append() (either that, or .setData() with a numpy array you appended yourself). Point being, if you're considering having pyqtgraph append data to a plot, there is no getting around that somewhere along the way, you're going to need to call np.append; given numpy uses shallow references, pyqtgraph appending the data will have the added benefit of having the variable that was storing the original array also get updated (is that a feature or a bug?).

There are some alternatives to np.append() such as np.concatenate which may speed things up a bit, it's still an expensive operation, there's no getting around that.

@NilsNemitz
Copy link
Copy Markdown
Contributor Author

Thank you, particularly this bit was quite helpful:
"The other factor is that the PlotDataItem instance needs to have a reference/view to the underlying data, as a single array."
--> We need to merge all data together and provide a convenient handle to the full set of data. But this handle can be a numpy view!

When appending data, the updated code now allocates a buffer that is 1.5x larger than needed. This buffer contains

  • the original data
  • the appended data
  • 33.3% uninitialized space.

yData is then assigned to be a view of the elements that contain original and appended data. (Same for xData)

The next time data is appended, it can be directly placed in the buffer (a limited copy operation), and only the view needs to be updated (zero copying?).

A new allocation of memory and full copy of all data is only done when the buffer is full. At that time, a new buffer is allocated, once again at 150% of the currently needed size.

For the expected use of repeatedly adding a small number of elements, this should be dramatically faster than calling np.append() for each operation. It should also be invisible to existing code, which only interacts with a view of the initialized part of old + new data.

Occasionally, appending data will force a buffer re-allocation, but this becomes more rare as the dataset grows.

@goodboy
Copy link
Copy Markdown

goodboy commented Feb 6, 2021

I guess I'll do one last rant then head off to the woods 😄

This need is for many reasons, but the best example that I can think of is downsampling/resampling. Because we need a continuous array of the underlying data, appendData=True can't just generate a QPainterPath and append it to the existing one, and call it good, while it would be fast, all sorts of other stuff will break.

Not sure what you're asserting. Obviously if you're going to do downsampling of numpy array data then you need the data; I don't think I was ever claiming otherwise. However, why downsampling would be coupled to graphics rendering logic with QPainterPath is beyond me; there should be very distinct separation of these mechanisms imo.

Because we need a continuous array of the underlying data

Why would you not have this?

appendData=True can't just generate a QPainterPath and append it to the existing one,

Unless your downsampling algo requires a large look-back (which afaict none of them do) I don't see how you can claim this.
Pass in new data (not yet "appended" to the set), downsample it, pass new downsampled data to graphics rendering, render new path, append path to old path. There's nothing limiting this other then the code for it being embedded inside .getData(); you can still set .xDisp/.yDisp to the concatenated-downsampled output and return the full output data you just don't need to re-downsample the entire input data set every cycle.

Frankly, there's quite a lot of code inside these graphics types that intertwines data processing and graphics rendering machinery unnecessarily and it makes customizing/speeding up things pretty difficult (so much so you might as well just re implement the entire PlotDataItem yourself). If you want to downsample, have downsampling routines the user can grab when needed, use them, then pass the output data to the graphics rendering system. Even right now, if you wanted to introduce a new downsampling algo to PlotDataItem and test it, it basically requires overloading core implementation apis/types which imo is very non-ideal.

Yeah, so I'll just say it one last time then probably peace out from this discussion 😂 but, I really don't think sticking "append logic" inside PlotDataItem improves anything and in fact just confuses the user with even more implicit behavior behind flags that can't be changed easily.

Why:

  • There's already examples that show how to append data and the user code is trivial (also another faster example here) though I'm sure they could be made more clear/prominent in the docs.
  • a graphics library should not be tracking input data state where it doesn't need to, this re-alloc code can just as easily offered as an update helper routine (possibly offerred by a pyqtgraph module) and doesn't have to be implicit behavior inside this graphics type.
  • there's no special speedup sauce here since nothing in graphics rendering machinery has actually changed; this is just a very basic well known update/extend recipe for a numpy array.
  • taking this approach of slapping more and more code into core graphics types/apis is just going to result in even larger class monstrosities down the road making it even more difficult for users to override behavior for their own purposes.

As a final note, I think the main issue with almost all of these "graphics" apis in pyqtgraph is that they seem to trying to treat the Qt graphics system as though it's writeable with numpy data. This is of course not at all how things actually work underneath. Really the Qt graphics code is a consumer of input numpy array data; so the graphics rendering apis should appear as though they consume fully processed output data as input so as to read and display it, not the other way around. When you look at things from this perspective having fourier transforms and downsampling and inside a GraphicsObject doesn't really make any sense. If there's supposed to be some all-encompassing api for doing both input data processing and graphics rendering from the resulting output, it should be done in an api layer above the graphics rendering system and types. Right now all of it is mashed together into an api that doesn't really have a clear specialization.

GraphicsItem for displaying plot curves, scatter plots, or both.

wat.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Feb 6, 2021

This is a lengthy reply, and I'm watching my kids this weekend so I can't address all the points as I'm being interrupted here every 10 seconds, so apologies if some sentencing looks broken...

However, why downsampling would be coupled to graphics rendering logic with QPainterPath is beyond me; there should be very distinct separation of these mechanisms imo.

...but it's not... downsample logic is here

https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph/graphicsItems/PlotDataItem.py#L665-L682

but the rendering of the QPainterPath is done here:

https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph/functions.py#L1485

The logic is indeed spread out, we have a method that generates a QPainterPath from an array that's provided.

Pass in new data (not yet "appended" to the set), downsample it, pass new downsampled data to graphics rendering, render new path, append path to old path.

This works right up until you zoom in, when you zoom in when you previously had downsampled, and need to re-calculate the entire QPainterPath, how would that work if the underlying data wasn't one continuous array?

Frankly, there's quite a lot of code inside these graphics types that intertwines data processing and graphics rendering machinery unnecessarily and it makes customizing/speeding up things pretty difficult (so much so you might as well just re implement the entire PlotDataItem yourself).

Subclassing is the Qt way! Ask me about QSortFilterProxyModels sometime ...😢

Yeah, so I'll just say it one last time then probably peace out from this discussion 😂 but, I really don't think sticking "append logic" inside PlotDataItem improves anything and in fact just confuses the user with even more implicit behavior behind flags that can't be changed easily.

As we're looking at some of the considerations/consequences here I admit I too am starting to question the utility; You did point out the scrollingPlots.py demo, but I would argue the way plot 4 is done (where PlotDataItem.append() would be most useful) is not particularly clear. It requires a bit more knowledge of how numpy works with pre-allocating a larger array, leaving a portion of it empty, and only plotting a component of it. While that clearly works, it requires a bit more knowledge of numpy. I think we can do better.

Right now all of it is mashed together into an api that doesn't really have a clear specialization.

With respect to data processing, downsampling and drawing, I do feel like they are decently separated. pyqtgraph provides some convenience methods for common operations/transformations such as the fourier transformation.

I don't really understand what you mean with the rest of the paragraph, maybe you can provide an example, but if it's not relevant to this PR, but submit a draft PR where you take a portion of PlotDataItem and highlight what you mean better; it's hard to talk about these kind of concepts regarding API design without an example, or some kind of mach-up, but an existing PR is likely not the right place to introduce that.

@goodboy
Copy link
Copy Markdown

goodboy commented Feb 6, 2021

The logic is indeed spread out,
but the rendering of the QPainterPath is done here:

I guess you didn't click on any of my links 😿
I wasn't saying it should be coupled or is coupled I said why you would want it to be coupled, doesn't make sense to me.

I think we're getting lost 😂

You originally said:

The other factor is that the PlotDataItem instance needs to have a reference/view to the underlying data, as a single array. This need is for many reasons, but the best example that I can think of is downsampling/resampling. Because we need a continuous array of the underlying data, appendData=True can't just generate a QPainterPath and append it to the existing one, and call it good, while it would be fast, all sorts of other stuff will break.

This is a false dilemma: you can keep a reference to your continuous array and only update parts of the path that need to be based on zoom level and the current state of the continuous array reference. You can also cache QPainterPath outputs for different zoom levels, input data sets, etc.

I don't understand why we will have a other stuff will break concern here.

Reread my original comment.
Maybe you misread something I suggested originally?

This works right up until you zoom in, when you zoom in when you previously had downsampled, and need to re-calculate the entire QPainterPath, how would that work if the underlying data wasn't one continuous array?

I don't understand where you're getting this idea that you would not have one continuous array.
Where has it been suggested that input data will ever not be one continuous array?
I really am confused where this notion even came from.

Subclassing is the Qt way!

But not often the python way these days 😎

For example, look at the way vispy has built its apis - much much more pythonic and uses object composition to make internals very elegant.
Imo pyqtgraph should be moving in this direction.

but submit a draft PR where you take a portion of PlotDataItem and highlight what you mean better

I have to be honest, I can't really think of a reason to need to use PlotDataItem if you're doing real-time plot updating.
Just subclass a numpy array and add some .append() method to it if you want this "appending feature" and pass that array to .setData().
I still don't see any reason why this PlotDataItem thing is now supporting appending to numpy arrays for the user.

I was originally suggesting:

I do propose that a "better" and more performant way to address this need for a fast graphics update is to create a way to generate QPainterPath objects that can actually be appended (or prepended) to existing paths in view based on data diffing with the existing path. For example QPainterPath has a .boundingRect() which can be used to determine x-axis max/min data to determine if new paths need to be generated from the input data versus what was last drawn.

The original feature request that drove this PR seems to me to be better solved by documenting the example recipes.
For actual appending to graphics this is not at all the right place in the code to accomplish it.

Anyway will jump in chat for further banter.

@j9ac9k j9ac9k mentioned this pull request Feb 7, 2021
@NilsNemitz
Copy link
Copy Markdown
Contributor Author

As we continue to work out the distinction between PlotCurveItem handling the fast plotting of well-conditioned data, and PlotDataItem mapping and conditioning the raw user data, I think this might be worth considering again.

On the PlotDataItem level I do not really see an obstacle to buffering the data: Buffers are already used to handle e.g. logarithmic mapping and dynamic range reduction, so the addition of an optional append buffer is not a fundamental change. Although the functionality isn't particularly sophisticated, it saves the user a few lines of boilerplate code.

In the context of quick exploratory data evaluation that might be a not-insignificant benefit; especially if we can improve integration with environments like Jupyter.

Not really related, but I have also added a check for string content of the 'connect' parameter; this should close #2027.

@j9ac9k j9ac9k added the sprint Tag relating to issue/PR that is a good candidate for addressing during the sprints label Nov 8, 2021
if idx == 2: break # only convert the first two arguments
try: # if it converts to float, then convert to single-value ndarray
arg_list[idx] = np.full(1, float(arg_list[idx]))
except TypeError: pass # TypeError occurs for data that does not need conversion
Copy link
Copy Markdown
Member

@j9ac9k j9ac9k Nov 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can clean this up a decent amount I think!

# at the top of the file
from contextlib import suppress


....

def appendData(self, *args, **kwargs):

    arg_list = list(args)
    for idx, arg in enumerate(args):
        if idx == 2:
            break
        with suppress(TypeError):
            # TypeError occurs for data that does not need conversion
            arg_list[idx] = np.full(1, float(arg))
     kargs['append'] = True
     self.setData(*arg_list, **kargs)

for _ in range(len(iterable)) is generally considered a code-smell, if you care about indexes you should use enumerate(iterable). Also whenever you have try/except: pass, the suppress is typically what you want; functionally they're the same thing, but it's just a bit more readable ...but now we're in nit-pick territory.

@j9ac9k
Copy link
Copy Markdown
Member

j9ac9k commented Nov 14, 2021

Hi @NilsNemitz Thanks for walking me through this!

Only suggestion I have is to modify the scrollingPlots.py example to make use of this new API. If we want to track the performance here, we could add a profiler, I'll put an inline comment with how to add one...

self.containsNonfinite = not (all_x_are_finite and all_y_are_finite) # This always yields a definite True/False answer
self._dataRect = QtCore.QRectF( QtCore.QPointF(xmin,ymin), QtCore.QPointF(xmax,ymax) )

def append(self, x, y):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we might be concerned regarding the performance here, I would consider adding the following bit as the first line of code in this function.

def append(self, x, y):
    """
    docstring
    """
    profiler = pg.debug.Profiler()  # add this line
    old_length = len(self.y)
    .....

Having a profiler in a particular function is usually a decent indication that we're concerned about the performance here, and makes it simpler to modify the line so it actually spits output on the console if we decide to investigate fruther.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sprint Tag relating to issue/PR that is a good candidate for addressing during the sprints

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Implement PlotDataItem.appendData()

3 participants