DOC Update/clarify plot_unveil_tree_structure.py#16942
DOC Update/clarify plot_unveil_tree_structure.py#16942glemaitre merged 10 commits intoscikit-learn:masterfrom
Conversation
| # Among these arrays, we have: | ||
| # - ``children_left[i]`` - id of the left child of node i or -1 if leaf node | ||
| # - ``children_right[i]`` - id of the right child of node i or -1 if leaf | ||
| # node | ||
| # - ``feature[i]`` - feature used for splitting node i | ||
| # - ``threshold[i]`` - threshold value at node i | ||
| # - ``n_node_samples[i]`` - the number of of training samples reaching node i | ||
| # - ``impurity[i]`` - the impurity at node i |
There was a problem hiding this comment.
| # Among these arrays, we have: | |
| # - ``children_left[i]`` - id of the left child of node i or -1 if leaf node | |
| # - ``children_right[i]`` - id of the right child of node i or -1 if leaf | |
| # node | |
| # - ``feature[i]`` - feature used for splitting node i | |
| # - ``threshold[i]`` - threshold value at node i | |
| # - ``n_node_samples[i]`` - the number of of training samples reaching node i | |
| # - ``impurity[i]`` - the impurity at node i | |
| # Among these arrays, we have: | |
| # - ``children_left[i]``: id of the left child of node `i` or -1 if leaf node | |
| # - ``children_right[i]``: id of the right child of node i or -1 if leaf node | |
| # - ``feature[i]``: feature used for splitting node `i` | |
| # - ``threshold[i]``: threshold value at node `i` | |
| # - ``n_node_samples[i]``: the number of of training samples reaching | |
| # node `i` | |
| # - ``impurity[i]`` - the impurity at node `i` |
glemaitre
left a comment
There was a problem hiding this comment.
LGTM. @lucyleeow Could you replace the string with f-string when relevant. It could make the code more readable here.
| # to low level attributes such as ``node_count``, the total number of nodes, | ||
| # and ``max_depth``, the maximal depth of the tree. It also stores the | ||
| # entire binary tree structure, represented as a number of parallel arrays. The | ||
| # i-th element of each array holds information about the node i. Node 0 is the |
There was a problem hiding this comment.
| # i-th element of each array holds information about the node i. Node 0 is the | |
| # i-th element of each array holds information about the node `i`. Node 0 is the |
| # We can also retrieve the decision path of samples of interest. The | ||
| # ``decision_path`` method outputs an indicator matrix that allows us to | ||
| # retrieve the nodes the samples of interest traverse through. A non zero | ||
| # element in the indicator matrix at position (i, j) indicates that |
There was a problem hiding this comment.
| # element in the indicator matrix at position (i, j) indicates that | |
| # element in the indicator matrix at position `(i, j)` indicates that |
| # ``decision_path`` method outputs an indicator matrix that allows us to | ||
| # retrieve the nodes the samples of interest traverse through. A non zero | ||
| # element in the indicator matrix at position (i, j) indicates that | ||
| # the sample i goes through the node j. Or, for one sample, i, the positions of |
There was a problem hiding this comment.
| # the sample i goes through the node j. Or, for one sample, i, the positions of | |
| # the sample `i` goes through the node j. Or, for one sample, `i`, the positions of |
| # retrieve the nodes the samples of interest traverse through. A non zero | ||
| # element in the indicator matrix at position (i, j) indicates that | ||
| # the sample i goes through the node j. Or, for one sample, i, the positions of | ||
| # the non zero elements in row i of the indicator matrix designate the ids |
There was a problem hiding this comment.
| # the non zero elements in row i of the indicator matrix designate the ids | |
| # the non zero elements in row `i` of the indicator matrix designate the ids |
| # ``apply`` method. This returns an array of the node ids of the leaves | ||
| # reached by each sample of interest. Using the leaf ids and the | ||
| # ``decision_path`` we can obtain the tests that were used to predict a sample | ||
| # or a group of samples. First, let's do it for one sample. Note: |
There was a problem hiding this comment.
| # or a group of samples. First, let's do it for one sample. Note: | |
| # or a group of samples. First, let's do it for one sample. Note that |
NicolasHug
left a comment
There was a problem hiding this comment.
thanks @lucyleeow , looks good
I haven't checked the rendered docs though
| # ``decision_path`` method outputs an indicator matrix that allows us to | ||
| # retrieve the nodes the samples of interest traverse through. A non zero | ||
| # element in the indicator matrix at position ``(i, j)`` indicates that | ||
| # the sample ``i`` goes through the node ``j``. Or, for one sample, ``i``, the |
There was a problem hiding this comment.
| # the sample ``i`` goes through the node ``j``. Or, for one sample, ``i``, the | |
| # the sample ``i`` goes through the node ``j``. Or, for one sample ``i``, the |
| node_depth = np.zeros(shape=n_nodes, dtype=np.int64) | ||
| is_leaves = np.zeros(shape=n_nodes, dtype=bool) | ||
| stack = [(0, -1)] # seed is the root node id and its parent depth | ||
| stack = [(0, -1)] # start with the root node id (0) and its parent depth (-1) |
There was a problem hiding this comment.
kind of a nit but this could just be [(0, 0)] and we could have
node_id, current_depth = stack.pop()
below. It seems unnecessarily contrived to deal with the parent's depth
There was a problem hiding this comment.
You're right, I don't know why I didn't question this.
| common_nodes = (node_indicator.toarray()[sample_ids].sum(axis=0) == | ||
| len(sample_ids)) | ||
|
|
||
| # obstain node ids using position in array |
There was a problem hiding this comment.
| # obstain node ids using position in array | |
| # obtain node ids using position in array |
| # The leaf ids reached by samples of interest can be obtained with the | ||
| # ``apply`` method. This returns an array of the node ids of the leaves | ||
| # reached by each sample of interest. Using the leaf ids and the | ||
| # ``decision_path`` we can obtain the tests that were used to predict a sample |
There was a problem hiding this comment.
maybe "split tests" or "splitting conditions"
Reference Issues/PRs
None
What does this implement/fix? Explain your changes.
Any other comments?