Skip to content

Revamping (yes!) the plotting interface: matplotlib and plotly#425

Merged
ntamas merged 101 commits intoigraph:developfrom
iosonofabio:mpl_rehaul
Sep 14, 2021
Merged

Revamping (yes!) the plotting interface: matplotlib and plotly#425
ntamas merged 101 commits intoigraph:developfrom
iosonofabio:mpl_rehaul

Conversation

@iosonofabio
Copy link
Copy Markdown
Member

@iosonofabio iosonofabio commented Jul 1, 2021

While matplotlib is supported for plotting now, the actual draw function is a hack.

By comparison, the Cairo drawing infrastructure is quite extensive - perhaps too much so.

Here I'm trying to clone the good parts of the Cairo interface into a revamped matplotlib interface, to stop the hacking from proliferating. I generally keep the inheritance a little shallower than it used to be, for sake of readability.

There is one issue on the table: matplotlib usually groups objects of the same kind for efficiency reasons into "collections". For instance, ATM all vertices are a single "collection" under ax.get_children. That grouping, which is reminescent of how SVG files work, is honestly convenient and enables changing color/sizes of all vertices at once. Ideally I'll try to have one collection for the vertices and one for the edges (not sure you can have one for the labels?), but that might fail.

I'll keep you posted.

edit: for folks who are new to this thread: I've redesigned the matplotlib plotting interface to integrate properly with the rest of the library, including:

  1. switch to preexisting Cairo functions to compute plotting options: colors, convex hulls, edge curvature, loops, vertex size, etc.
  2. add a configuration parameter: plotting.backend to switch btw cairo and matplotlib
  3. add an optional legend to clustering plots
  4. adapt and expand documentation on matplotlib plotting
  5. add unit tests for several kinds of plots, including an infrastructure to run image comparisons on CI (adapted from matplotlib)

NOTE: I could expand point 5. to include tests for Cairo. The testing infrastructure would rely on matplotlib anyway though.

The main entry point for plotting is now ig.plot. For Cairo, that creates a CairoPlot class as before (was called Plot, that name is still available for API compatibility). For matplotlib, that uses or creates a matplotlib Axes, just like any other library on the planet that relies on matplotlib. Axes are abstractions that fulfill the role of CairoPlot in a generic way.

There's currently a few #FIXME tags across the codebase for small things, mostly deprecation warnings. After that's fixed, it'd be great to have one or two of you testing this and then merging.

@szhorvat
Copy link
Copy Markdown
Member

szhorvat commented Jul 1, 2021

Question (really, just a question):

What will be the performance effects of switching to matplotlib? How much performance do we lose or gain? Is plotting gigantic graphs (which are becoming more and more common) still possible, as e.g. on Tamás's website? http://sixdegrees.hu/last.fm/

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jul 1, 2021

@szhorvat That graph on my website was created with a custom plotting function, not with igraph's built-in plot subsystem; it would have been too large for that, plus it did not offer the opportunity to put images as backgrounds in the nodes. (If you zoom in close enough on the interactive version of the artist map, you can actually see the profile pictures of the artists in the nodes).

I think that the current plotting backend is not really performant either, and I agree with @iosonofabio in the sense that it is probably a bit overengineered (this was my first attempt at a plotting backend). Also, I was in love with metaclasses at that time and I tried to solve everything with metaclasses :) So, making the Matplotlib backend more structured while keeping only the good bits of the current plotting backend is a good direction to go in.

@iosonofabio
Copy link
Copy Markdown
Member Author

A quick update in case you are interested.

Turns out many objects across the library have a __plot__ method that contains some Cairo code. I'm trying to move that code into the drawing folder, using appropriately named modules. Instead, the actual __plot__ function is just a passthrough that decides the backend (Cairo/Matplotlib), in analogy with the global igraph.plot function. A little bit of care is required to avoid cyclic deps.

This design might or might not stay depending on everyone's opinion, but at least it'd be great to move all floating chunks of Cairo code into drawing, so I think it's worth anyway and if we don't like it we can discuss better designs in this PR.

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jul 5, 2021

Thanks for the update @iosonofabio ; I'm keeping an eye on this PR.

Yes, ideally all Cairo-specific stuff should be hidden behind the Cairo backend in igraph.drawing; we should not leak these out to the __plot__ methods.

@iosonofabio
Copy link
Copy Markdown
Member Author

Alright, three things obviously missing and then it's ready for discussion:

  1. reimplementing Dendrogram plots in matplotlib. Not a biggie, turns out I have committed into matplotlib's dendrogram plots myself years ago ;-)
  2. transferring the radius smoothener for polygons
  3. edge labels and vertex/edge label positioning

Will take another day or two I think.

@iosonofabio
Copy link
Copy Markdown
Member Author

Alright, everything is implemented except for label positioning for vertices and edges. The main reason is that I'm a little unsure how matplotlib handles calculations for text box sizes across its own backends.

@ntamas if you could take an initial look at the many changes and advise on improvements, that'd be great. I'll try to figure out the text box thing in the meanwhile.

This should close a bunch of open bugs because it aligns the matplotlib infra with the existing Cairo, i.e. anything Cairo does right or wrong, mpl just follows.

Thanks!

@iosonofabio
Copy link
Copy Markdown
Member Author

There's one more thing I could do here, but it'd be good to hear opinions on it.

ATM in the single Configuration instance there are a few plotting options. We could add a "backend" option that sets the defaults to either Cairo or matplotlib. That would let people default their machine to mpl if they wished to. The default config value would still be Cairo for now.

If we do this (I'd be in favour) then it'd be a good idea to rename all those DefaultXXXDrawer into CairoXXXDrawer for clarity, and PolygonDrawer into CairoPolygonDrawer. That would break API big time.

What do you think?

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jul 12, 2021

I'll go through the commits in this PR shortly, but a few responses first:

  • Adding a backend option to Configuration is a great idea, I totally agree with it.
  • Renaming the DefaultXXXDrawer and the PolygonDrawer classes are also okay for me. We can keep temporary aliases to them until 0.10.x if somehow this gets released in 0.9.x. (I mean, just set DefaultGraphDrawer = CairoGraphDrawer and PolygonDrawer = CairoPolygonDrawer and we are good).

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jul 12, 2021

Okay, it seems like a common pattern that you need is "if the context seems to imply that the user wants the Matplotlib backend, then construct a SomethingDrawer object for the Matplotlib backend, otherwise construct an object for the Cairo backend. Plus, you need to pass the bbox and maybe the palette to the Cairo drawer but not to the Matplotlib backend. This should be factored out somehow, especially if we also allow the user to select a drawing backend in the configuration.

I'm starting to think that maybe the __plot__ functions should already be given information about the backend they must work with, and the top-level plot() function should be the one responsible for picking a plotting backend (if the user did not specify one in the configuration; otherwise it should just use whatever the user wants). Then the only responsibility of the __plot__ function would be to construct the appropriate SomethingDrawer based on the backend it has been given; we should not repeat the logic for identifying the backend that the user desires in each __plot__ function.


I've also seen that you have created a MatplotlibPlot class that duplicates some of the functionality already in Plot. Plot should then probably be renamed to CairoPlot, but I'm not sure about this direction. The original idea was that the plot() function in igraph is a shortcut to creating a Plot object, adding a single object to it, and then calling show() or save() on it, depending on whether a filename was given to plot() or not. Users who want to do trickier things like placing multiple igraph objects on the same plot could then construct a Plot instance on their own, call .add() on it multiple times with the appropriate bounding boxes, and then call .save() or .show() at the end. It would be great if we could keep Plot as a single top-level class that the user can construct and use for advanced plotting requirements, but this might conflict with what Matplotlib is capable of doing. If it is not possible for some reason, then we should create a create_plot() factory function (aliased to Plot for sake of backward compatibility) that creates either a CairoPlot or a MatplotlibPlot depending on the current plotting backend so the user could easily switch plotting backends without changing his code.

@iosonofabio
Copy link
Copy Markdown
Member Author

Thanks @ntamas

Re the Plot class: I could keep a single class. The questionable thing is that each method would start with:

if self._backend == 'matplotlib':
   (function body)
else:
   (function body)

which is quite an antipattern I think. Slightly better would be to have pairs of methods:

class Plot:
    def _redraw_matplotlib(self):
        ...
    def _redraw_cairo(self)
        ...
    def redraw(self):
        '''(docstring)'''
       return getattr(self, '_redraw_'+self._backend)()

Next version would be to have two classes, CairoPlot and MatplotlibPlot, each with its own secret methods, and have Plot inheriting from both and implementing the generic passthrough methods (e.g. redraw).

This infrastructure, as well as bounding boxes, can be all done by hand in matplotlib, but most people wouldn't. plt.Axes is basically an abstraction layer that lets people do that (think of an "inset panel") without bothering with canvas-level coordinates (e.g. pixels). There is an object called a Transform that basically translates data coordinates or axes coordinates into canvas coordinates and back. So folks would opt for a dedicated Axes (perhaps with transparent background) instead of manually implementing the transforms as done here.

That works especially well in interactive backends, where you can dynamically resize the figure with the mouse, because then the absolute pixel coordinates of the plotted objects change (I do that all the time): matplotlib registers a callback function into the interactive backend that updates the Transform based on the figure size and triggers a redraw. Reimplementing all of that seems inconvenient.

Alright, in conclusion let me have a look and see if I can come up with something that makes sense.

@szhorvat
Copy link
Copy Markdown
Member

@iosonofabio Maybe this is of interest to you? Does matplotlib have built-in legending which can easily be made use of with the new matplotlib backend, but not with cairo?

@iosonofabio
Copy link
Copy Markdown
Member Author

Thanks @szhorvat

yes matplotlib has infra for legends. That could be an easy option for clusters, happy to implement it if you think it'd be useful.

@szhorvat
Copy link
Copy Markdown
Member

I only have minimal experience with matplotlib so I can't judge if this should be built-in to igraph, or if matplotlib support has already made it much easier to add legends than it was with cairo (if yes, you might write an answer on StackOverflow?) Just wanted to bring it to your attention that there is demand for legends. You'll know better than me how to best satisfy that demand, or if anything needs to be done on igraph's side.

@iosonofabio
Copy link
Copy Markdown
Member Author

Thanks, if there's a demand I'll try to include that in this PR.

@iosonofabio
Copy link
Copy Markdown
Member Author

plot_with_legend

@iosonofabio
Copy link
Copy Markdown
Member Author

I now tried to restructure the Plot infra via composition instead of inheritance. @ntamas would be great to hear your thoughts on that - well, hear as in read I s'pose ;-)

I also injected backend info into the __plot__ methods, it cleans up a bit I must say, thanks for the idea.

I also implemented the legend @szhorvat mentioned, only in matplotlib, and without documentation for now (see above picture)

@iosonofabio
Copy link
Copy Markdown
Member Author

There is progress, now the import error for plotly doesn't happen anymore. We are left with two things to fix (I'll take care of both ASAP, hopefully):

  1. small rounding/formatting errors in the plotly JSON
  2. matplotlib figures on 32bit linux have a large difference from the expected figures...?

I'll try a few more commits

@iosonofabio
Copy link
Copy Markdown
Member Author

iosonofabio commented Sep 11, 2021

Alright, I'm getting painfully aware of the constraints here. Seems like matplotlib conversion PDF->PNG requires ghostscript 9.0. My laptop has 9.54, but of course cibuildwheel images have 8.70 so mpl refuses to acknowledge that installation. After all, gs 9.0 was released quite recently, just short of 12 years ago... 😭

But turns out cibuildwheels has options to specify the manylinux image to use, and manylinux2014, although a youngling at 8 years of age, might have ghostscrpt 9.0 which was already going to kindergarden at the time. I'm checking if that avenue is available, otherwise I'd be in favour of just using PNGs and debugging why the linux i686 platform produces large RMS differences compared to the expected outputs.

edit: manylinux2014 indeed works... except for Python 3.6?? anyway, ensuring the right version of ghostscript is present across architechtures, oses, and Python versions seems like an expensive maintenence move compared to a few more kbs of images in the repo, so I'm going back to using PNG images as default now.

@iosonofabio
Copy link
Copy Markdown
Member Author

Ok, it now works except for an installation issue on Pypy 3.6/3.7 across OSes which to me looks unrelated to matplotlib. @ntamas sorry for the verbose PR: do you have any idea what the pypy thing could be about?

@iosonofabio
Copy link
Copy Markdown
Member Author

Ok, finally skipping matplotlib tests on Pypy since they require numpy and that doesn't seem to install well on some pypy versions.

@iosonofabio
Copy link
Copy Markdown
Member Author

I think this is ready for (squash and?) merge. I'll open a separate PR later on to improve docs and tutorials, esp. with the new backends.

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Sep 13, 2021

@iosonofabio PyPy is mostly a mystery to me as well; I have never used it seriously. I wouldn't be surprised if someone told me that NumPy and/or Matplotlib is not compatible with PyPy at all, so I think it's okay to ignore the failure.

@ntamas ntamas changed the base branch from master to develop September 13, 2021 17:47
@ntamas
Copy link
Copy Markdown
Member

ntamas commented Sep 13, 2021

Re-targeting to develop as these changes are too extensive to be included in the next patch release.

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Sep 13, 2021

Okay, all tests passed even after rebasing; let me go through the PR once again tomorrow and then I'll merge.

@iosonofabio
Copy link
Copy Markdown
Member Author

Fantastic than you

@ntamas ntamas merged commit 7b14aaa into igraph:develop Sep 14, 2021
@ntamas
Copy link
Copy Markdown
Member

ntamas commented Sep 14, 2021

🎉 !

@iosonofabio
Copy link
Copy Markdown
Member Author

Yaaay, revamp successful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants