DOC Update/clarify plot_unveil_tree_structure.py by lucyleeow · Pull Request #16942 · scikit-learn/scikit-learn

lucyleeow · 2020-04-16T21:17:25Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

Change example to notebook style with alternating text and code blocks
Clarify explanations and comment code more
Consistently use the term 'split node', instead of alternating between 'split node' and 'test node' (happy to change back)

Any other comments?

glemaitre · 2020-05-18T15:17:31Z

examples/tree/plot_unveil_tree_structure.py

+# Among these arrays, we have:
+#   - ``children_left[i]`` - id of the left child of node i or -1 if leaf node
+#   - ``children_right[i]`` - id of the right child of node i or -1 if leaf
+#     node
+#   - ``feature[i]`` - feature used for splitting node i
+#   - ``threshold[i]`` - threshold value at node i
+#   - ``n_node_samples[i]`` - the number of of training samples reaching node i
+#   - ``impurity[i]`` - the impurity at node i


Suggested change

# Among these arrays, we have:

# - ``children_left[i]`` - id of the left child of node i or -1 if leaf node

# - ``children_right[i]`` - id of the right child of node i or -1 if leaf

# node

# - ``feature[i]`` - feature used for splitting node i

# - ``threshold[i]`` - threshold value at node i

# - ``n_node_samples[i]`` - the number of of training samples reaching node i

# - ``impurity[i]`` - the impurity at node i

# Among these arrays, we have:

# - ``children_left[i]``: id of the left child of node `i` or -1 if leaf node

# - ``children_right[i]``: id of the right child of node i or -1 if leaf node

# - ``feature[i]``: feature used for splitting node `i`

# - ``threshold[i]``: threshold value at node `i`

# - ``n_node_samples[i]``: the number of of training samples reaching

# node `i`

# - ``impurity[i]`` - the impurity at node `i`

glemaitre

LGTM. @lucyleeow Could you replace the string with f-string when relevant. It could make the code more readable here.

glemaitre · 2020-05-18T15:17:47Z

examples/tree/plot_unveil_tree_structure.py

+# to low level attributes such as ``node_count``, the total number of nodes,
+# and ``max_depth``, the maximal depth of the tree. It also stores the
+# entire binary tree structure, represented as a number of parallel arrays. The
+# i-th element of each array holds information about the node i. Node 0 is the


Suggested change

# i-th element of each array holds information about the node i. Node 0 is the

# i-th element of each array holds information about the node `i`. Node 0 is the

glemaitre · 2020-05-18T15:36:55Z

examples/tree/plot_unveil_tree_structure.py

+# We can also retrieve the decision path of samples of interest. The
+# ``decision_path`` method outputs an indicator matrix that allows us to
+# retrieve the nodes the samples of interest traverse through. A non zero
+# element in the indicator matrix at position (i, j) indicates that


Suggested change

# element in the indicator matrix at position (i, j) indicates that

# element in the indicator matrix at position `(i, j)` indicates that

glemaitre · 2020-05-18T15:37:07Z

examples/tree/plot_unveil_tree_structure.py

+# ``decision_path`` method outputs an indicator matrix that allows us to
+# retrieve the nodes the samples of interest traverse through. A non zero
+# element in the indicator matrix at position (i, j) indicates that
+# the sample i goes through the node j. Or, for one sample, i, the positions of


Suggested change

# the sample i goes through the node j. Or, for one sample, i, the positions of

# the sample `i` goes through the node j. Or, for one sample, `i`, the positions of

glemaitre · 2020-05-18T15:37:15Z

examples/tree/plot_unveil_tree_structure.py

+# retrieve the nodes the samples of interest traverse through. A non zero
+# element in the indicator matrix at position (i, j) indicates that
+# the sample i goes through the node j. Or, for one sample, i, the positions of
+# the non zero elements in row i of the indicator matrix designate the ids


Suggested change

# the non zero elements in row i of the indicator matrix designate the ids

# the non zero elements in row `i` of the indicator matrix designate the ids

glemaitre · 2020-05-18T15:38:03Z

examples/tree/plot_unveil_tree_structure.py

+# ``apply`` method. This returns an array of the node ids of the leaves
+# reached by each sample of interest. Using the leaf ids and the
+# ``decision_path`` we can obtain the tests that were used to predict a sample
+# or a group of samples. First, let's do it for one sample. Note:


Suggested change

# or a group of samples. First, let's do it for one sample. Note:

# or a group of samples. First, let's do it for one sample. Note that

NicolasHug

thanks @lucyleeow , looks good

I haven't checked the rendered docs though

NicolasHug · 2020-05-18T16:22:29Z

examples/tree/plot_unveil_tree_structure.py

+# ``decision_path`` method outputs an indicator matrix that allows us to
+# retrieve the nodes the samples of interest traverse through. A non zero
+# element in the indicator matrix at position ``(i, j)`` indicates that
+# the sample ``i`` goes through the node ``j``. Or, for one sample, ``i``, the


Suggested change

# the sample ``i`` goes through the node ``j``. Or, for one sample, ``i``, the

# the sample ``i`` goes through the node ``j``. Or, for one sample ``i``, the

NicolasHug · 2020-05-18T16:23:58Z

examples/tree/plot_unveil_tree_structure.py

 node_depth = np.zeros(shape=n_nodes, dtype=np.int64)
 is_leaves = np.zeros(shape=n_nodes, dtype=bool)
-stack = [(0, -1)]  # seed is the root node id and its parent depth
+stack = [(0, -1)]  # start with the root node id (0) and its parent depth (-1)


kind of a nit but this could just be [(0, 0)] and we could have

node_id, current_depth = stack.pop()

below. It seems unnecessarily contrived to deal with the parent's depth

You're right, I don't know why I didn't question this.

NicolasHug · 2020-05-18T16:25:36Z

examples/tree/plot_unveil_tree_structure.py

 common_nodes = (node_indicator.toarray()[sample_ids].sum(axis=0) ==
                len(sample_ids))
-
+# obstain node ids using position in array


Suggested change

# obstain node ids using position in array

# obtain node ids using position in array

NicolasHug · 2020-05-18T16:26:45Z

examples/tree/plot_unveil_tree_structure.py

+# The leaf ids reached by samples of interest can be obtained with the
+# ``apply`` method. This returns an array of the node ids of the leaves
+# reached by each sample of interest. Using the leaf ids and the
+# ``decision_path`` we can obtain the tests that were used to predict a sample


maybe "split tests" or "splitting conditions"

lucyleeow added 3 commits April 16, 2020 23:12

update example

60f6f76

move n_nodes

abbad50

minor wording

28d781a

lucyleeow changed the title ~~WIP Update/clarify plot_unveil_tree_structure.py~~ DOC Update/clarify plot_unveil_tree_structure.py Apr 17, 2020

lucyleeow added 4 commits April 17, 2020 11:41

lint

586c566

explain leaf id

79db5db

fix rst links

e4d3a1a

add comma

8c44f48

glemaitre reviewed May 18, 2020

View reviewed changes

lucyleeow added 2 commits May 18, 2020 18:16

suggestions

bbc0470

lint

f1a87a9

NicolasHug approved these changes May 18, 2020

View reviewed changes

suggestions

19e939a

glemaitre merged commit d46663c into scikit-learn:master May 19, 2020

lucyleeow deleted the DOC_treestruct branch May 19, 2020 10:59

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

DOC Update/clarify plot_unveil_tree_structure.py (scikit-learn#16942)

97866a9

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

DOC Update/clarify plot_unveil_tree_structure.py (scikit-learn#16942)

d89f3ae

	# i-th element of each array holds information about the node i. Node 0 is the
	# i-th element of each array holds information about the node `i`. Node 0 is the

	# element in the indicator matrix at position (i, j) indicates that
	# element in the indicator matrix at position `(i, j)` indicates that

	# the sample i goes through the node j. Or, for one sample, i, the positions of
	# the sample `i` goes through the node j. Or, for one sample, `i`, the positions of

	# the non zero elements in row i of the indicator matrix designate the ids
	# the non zero elements in row `i` of the indicator matrix designate the ids

	# or a group of samples. First, let's do it for one sample. Note:
	# or a group of samples. First, let's do it for one sample. Note that

	# the sample ``i`` goes through the node ``j``. Or, for one sample, ``i``, the
	# the sample ``i`` goes through the node ``j``. Or, for one sample ``i``, the

	# obstain node ids using position in array
	# obtain node ids using position in array

Uh oh!

Conversation

lucyleeow commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucyleeow commented Apr 16, 2020 •

edited

Loading