[MRG] Fixes tree and forest classification for non-numeric multi-target by mitar · Pull Request #11458 · scikit-learn/scikit-learn

mitar · 2018-07-08T05:32:15Z

This fixes the issue that trees and forests cannot classify (but they can fit) non-numeric targets, when there are multiple targets.

mitar · 2018-10-09T18:06:06Z

Any update on this PR?

adrinjalali · 2018-10-09T18:17:37Z

sklearn/ensemble/tests/test_forest.py

+@pytest.mark.parametrize('name', FOREST_CLASSIFIERS)
+@pytest.mark.parametrize('oob_score', (True, False))
+def test_multi_target(name, oob_score):
+    check_multi_target(name, oob_score)


any reason for not having the body of the check_multi_target function directly here?

adrinjalali · 2018-10-09T18:17:46Z

sklearn/tree/tests/test_tree.py

+
+@pytest.mark.parametrize('name', CLF_TREES)
+def test_multi_target(name):
+    check_multi_target(name)


adrinjalali · 2018-10-09T18:18:39Z

sklearn/tree/tree.py

+                        axis=0))

-                return predictions
+                return np.array(predictions).T


this would try to figure the dtype of the array, right? how much is it slower than the status quo?

adrinjalali · 2018-10-09T18:20:50Z

You also need to rebase/merge master, you've got conflicts.

Other than that, I'm really not sure if this is a good idea. How many other estimators do we have that support string outputs? I suppose the recommended way is to convert the values before feeding them to estimators. I may be wrong.

jnothman · 2018-10-10T09:29:32Z

We support string targets where 1d (i.e. single target). I'm not entirely against supporting strong labels in multi output, but it should be by making sure that all estimators with multi output multiclass support, and any metrics, support this case. Let alone the case of mixed numeric and string data. At the moment I can't see that we test multi output multiclass in common tests at all.
When you think about it in terms of how much support it takes across the board to ensure consistent interfaces, I hope you can understand why we might not want to start

jnothman

What does your implementation do with a mix of string and numeric targets?

Fixes scikit-learn#11451.

rok · 2019-02-16T20:58:05Z

I've refactored the tests and the code a bit. Also PR is rebased to master.

@jnothman: What does your implementation do with a mix of string and numeric targets?

from sklearn.ensemble import ExtraTreesClassifier
import numpy as np

X = np.random.choice([1, 2, 3], (100, 100))

ys = np.array([
    np.random.choice(['foo', 'bar'], (100,)),
    np.random.choice([1, 2, 3], (100,))
])

clf = ExtraTreesClassifier()
clf = clf.fit(X, ys.T)
clf.predict(X)

Returns:

array([['bar', '1'],
       ['bar', '2'],
       ['foo', '1'],
...

Doing the same with a regressor would fail as targets need to be numerical.

Training target array is upcast to one dtype and we will get an array with the same dtype back from .predict. I feel this is an acceptable behavior.

A way to support a real mix of dtypes would be with structured array, but I don't know if we really want to do that?

adrinjalali · 2019-02-17T10:51:59Z

sklearn/ensemble/tests/test_forest.py

+    # Make multi-target.
+    ys = np.hstack([y, y])
+
+    # Try to fix and predict.


fix -> fit?

adrinjalali · 2019-02-17T10:52:24Z

sklearn/ensemble/tests/test_forest.py

+    y = np.array(['foo' if v else 'bar' for v in y]).reshape((y.shape[0], 1))
+
+    # Make multi-target.
+    ys = np.hstack([y, y])


try with a string and a numerical column just to be on the safe side in the test?

adrinjalali · 2019-02-17T10:58:50Z

This looks good to me.

Since Joel mentioned it, could you please kindly try adding the same test on common tests (sklearn/tests/test_common.py) and tell us where we fail on classifiers?

rok · 2019-02-19T01:10:20Z

Thaks for the quick responese @adrinjalali!
So that would mean adding mutlitarget checks here and here. Correct?
We would probably need to exclude certain regressors and classifiers that don't (yet) support multitarget.

jnothman

I would support doing common tests in a separate PR, ideally remembering to remove these tests as redundant.

adrinjalali · 2019-02-19T09:45:44Z

So that would mean adding mutlitarget checks here and here. Correct?

Yes, it'd be awesome if you could follow up on that and open a new PR and Joel suggests. This looks good for now. Merging this, and will follow up on your other [hopefully to be coming soon] PR :)

rok · 2019-02-19T10:53:39Z

Thank @adrinjalali, @mitar and @jnothman.
I'm creating a new issue #13187 for common tests and will open a PR soon.

…scikit-learn#11458) * Fixes tree and forest classification for non-numeric multi-target. Fixes scikit-learn#11451. * Renaming test functions, adding dtype to predictions array in tree.py. * Fixing flake8 issue. * Adding ignore warning to test_forest.py. * Switching to iris data for tests.

…i-target (scikit-learn#11458)" This reverts commit f95ffe6.

…scikit-learn#11458) * Fixes tree and forest classification for non-numeric multi-target. Fixes scikit-learn#11451. * Renaming test functions, adding dtype to predictions array in tree.py. * Fixing flake8 issue. * Adding ignore warning to test_forest.py. * Switching to iris data for tests.

mitar mentioned this pull request Jul 8, 2018

An error is thrown when using Random forest classifier multi-target non-numeric classes #11451

Closed

adrinjalali reviewed Oct 9, 2018

View reviewed changes

jnothman reviewed Oct 10, 2018

View reviewed changes

mitar and others added 2 commits February 15, 2019 15:09

Fixes tree and forest classification for non-numeric multi-target.

2ee5ddc

Fixes scikit-learn#11451.

Renaming test functions, adding dtype to predictions array in tree.py.

b26e943

rok force-pushed the fix-multi-target branch from e215890 to b26e943 Compare February 16, 2019 13:08

Fixing flake8 issue.

fcd597a

rok force-pushed the fix-multi-target branch from 5c5ecff to fcd597a Compare February 16, 2019 16:48

Adding ignore warning to test_forest.py.

b11265a

adrinjalali reviewed Feb 17, 2019

View reviewed changes

Switching to iris data for tests.

79f768a

jnothman reviewed Feb 19, 2019

View reviewed changes

jnothman approved these changes Feb 19, 2019

View reviewed changes

adrinjalali merged commit 95993a4 into scikit-learn:master Feb 19, 2019

adrinjalali mentioned this pull request Feb 19, 2019

Release 0.20.3 #13186

Merged

17 tasks

mitar deleted the fix-multi-target branch February 19, 2019 09:52

rok mentioned this pull request Feb 19, 2019

Missing multi-output checks in common tests #13187

Closed

jnothman mentioned this pull request Apr 24, 2019

DOC what's new cleaning #13706

Merged

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX Fixes tree and forest classification for non-numeric mult…

4d97f41

…i-target (scikit-learn#11458)" This reverts commit f95ffe6.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX Fixes tree and forest classification for non-numeric mult…

2d51db7

…i-target (scikit-learn#11458)" This reverts commit f95ffe6.

Uh oh!

Conversation

mitar commented Jul 8, 2018

Uh oh!

mitar commented Oct 9, 2018

Uh oh!

adrinjalali Oct 9, 2018

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 9, 2018

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 9, 2018

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Oct 9, 2018

Uh oh!

jnothman commented Oct 10, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

rok commented Feb 16, 2019

Uh oh!

adrinjalali Feb 17, 2019

Choose a reason for hiding this comment

Uh oh!

rok Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali Feb 17, 2019

Choose a reason for hiding this comment

Uh oh!

rok Feb 18, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Feb 17, 2019

Uh oh!

rok commented Feb 19, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Feb 19, 2019

Uh oh!

rok commented Feb 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants