[MRG] Matplotlib tree plotting#9251
Conversation
|
Graphviz uses a custom heuristic to solve an integer linear program as described here: |
trying to get rid of scalex, scaley
|
I changed the tests to work on the new color definition. They are visually indistinguishable and one might argue that |
render everything once to get the bbox sizes, then again to actually plot it with known extents.
|
check out rendered example here: https://12061-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/auto_examples/tree/plot_iris.html |
re-introduce scalex, scaley add max_extents to tree to get tree size before plotting
rth
left a comment
There was a problem hiding this comment.
Maybe my first review was overly negative. I think it's a nice improvement, and the ability to draw trees without graphviz is definitely important. The amount of added code is also relatively small with respect to what it allows to do. A few comments below (some of which you ready just addressed)
My main concern is that we are not testing this to the same extent as regular estimators. For GUI it's quite difficult anyway, just because we run the examples, doesn't mean that someone cannot make a PR that would make the output wrong without throwing an exception (or that next matplotlib version wont). Unless someone looks at the figure we won't be able to tell. It's different from regular examples, where it matter less, as users are expected to run this algorithm on any kind of tree. Similarly it's hard to know if the output will be always satisfactory for any kind of tree configuration. Running this in CircleCI or Travis is not really going to change that, I agree. The vendored sklearn/tree/_reingold_tilford.py also comes from a repo without unit tests, that hasn't been updated in 5-7 years.
Overall I think it's a nice improvement, and I'm +1 to merge it, but someone would need to answer future issues about this :) Maybe it could be worth marking plot_tree experimental in the docstring?
| # is about the same as the distance between boxes | ||
| max_x, max_y = draw_tree.max_extents() + 1 | ||
| ax_width = ax.get_window_extent().width | ||
| ax_height = ax.get_window_extent().height |
There was a problem hiding this comment.
The issue with this that it does not allow to change the size of the plot. For instance, I can't read the text with the default size (it's too tiny when it takes 1/4 of my screen by default, the below version is a bit zoomed, depending on your screen resolution)

while if I put it to full screen I get the following (with all the white space around it),

which is the area it occupied on my screen initially. No way to zoom in either.
Tested with matplotlib 2.2. I can confirm the results don't change for 3.0 either.
There was a problem hiding this comment.
you need to change figsize or dpi I think. Happy to hear about a better way to do that. I guess we could redraw on resize?
There was a problem hiding this comment.
Maybe we should add something to the docstring that you can control the size of the tree with figsize and dpi?
| >>> tree.export_graphviz(clf, | ||
| ... out_file='tree.dot') # doctest: +SKIP | ||
| >>> tree.plot_tree(clf) # doctest: +SKIP | ||
| [Text(251.5,345.217,'X[3] <= 0.8... |
There was a problem hiding this comment.
That will really change between runs? Can't remove the SKIP? Though, I guess it can also be impacted by matplotlib versions?
There was a problem hiding this comment.
I'm pretty sure I had a reason for this. I don't remember whether it was non-determinism or something about requiring the presence of matplotlib. I can try to remove it.
doc/whats_new/v0.21.rst
Outdated
|
|
||
| - |Fix| Fixed a bug in :class:`cluster.DBSCAN` with precomputed sparse neighbors | ||
| - |Fix| Fixed a bug in :class:`cluster.DBSCAN` with precomputed sparse neighbors |
There was a problem hiding this comment.
Extra spaces here (possibly issue with with merge).
doc/whats_new/v0.20.rst
Outdated
| ``n_samples`` parameter to indicate the number of samples to generate per | ||
| cluster. :issue:`8617` by :user:`Maskani Filali Mohamed <maskani-moh>` and | ||
| :user:`Konstantinos Katrioplas <kkatrio>`. | ||
| >>>>>>> master |
|
Also not running at this in Travis produces 92.72% (-0.24%) code coverage, which is a bit a shame. Maybe we could a short test for |
|
I would consider the code vendored but I would also consider the upstream unmaintained (I have not tried to contact the author so they might reply, but I wouldn't count on it). So adding some tests on |
|
The skip is required because otherwise the matplotlib import fails... |
jnothman
left a comment
There was a problem hiding this comment.
I wish I had the patience and time to go through the algorithm in detail, but I don't atm. I'm happy with the changes in export.py, and happy with the output.
| @@ -0,0 +1,187 @@ | |||
| # taken from https://github.com/llimllib/pymag-trees/blob/master/buchheim.py | |||
There was a problem hiding this comment.
should this file not include a more extensive license?
| @@ -0,0 +1,54 @@ | |||
| import numpy as np | |||
There was a problem hiding this comment.
can you please rename this to test_reingold_tilford.py?
# Conflicts: # doc/modules/tree.rst
…-learn into matplotlib_tree_plotting # Conflicts: # doc/modules/tree.rst
|
ping @rth ;) |
|
@amueller Will try to review at the end of the week. Thanks for making these improvements! |
|
Thanks! And sorry for being pushy. I'm afraid some other grant or something will come my way and it'll lie around for another year :-/ |
rth
left a comment
There was a problem hiding this comment.
No worries. It's a bit difficult to review every detail of this PR, but I can confirm that the expected figure is indeed produced and renders nicely. Overall the code LGTM, we can always fix some edge-cases as they arrive in the future.
There is one duplicate file that needs removing though (cf below).
sklearn/tree/tests/test_reingold.py
Outdated
| # reached all leafs | ||
| break | ||
| assert len(np.unique(x_at_this_depth)) == len(x_at_this_depth) | ||
| depth += 1 |
There was a problem hiding this comment.
This file is identical with sklearn/tree/tests/test_reingold_tilford.py and should be removed
|
Teehee good to have this merged! Now the text export too??
|
|
Is there a way to increase the size of the tree image? I can't make any sense of the one that has been plotted for my purposes and I can't just download graphviz because it's a company computer. |
|
Please ask such questions on stack overflow
|

Fixes #8508
This PR does three things:
For now this lives in the tree submodule to avoid conflicting with work on the plot submodule.
Todo:
We could try to avoid the intermediate Tree data structure but not sure if that's worth the hassle.
Example:
compared to graphviz:

The Reingold-Tilford algorithm makes sure a parent is centered above its children. Graphviz uses a different algorithm that results in a somewhat more compact (if possibly uglier?) layout.
Currently the box sizes of the nodes are ignored, i.e. assumed constant, where the constant is expressed using the
scalexandscaleyvariables.I think it's ok to assume all boxes are the same size. I don't aim for perfection, I aim for a reasonable plot.
Disclaimer: this is an afternoon hack, and I'm not sure if I'll be able to fix it.
I might use it at scipy though. (cc @agramfort )