Skip to content

[MRG] Dummy 2d fix#9169

Closed
amueller wants to merge 2 commits intoscikit-learn:mainfrom
amueller:dummy_2d_fix
Closed

[MRG] Dummy 2d fix#9169
amueller wants to merge 2 commits intoscikit-learn:mainfrom
amueller:dummy_2d_fix

Conversation

@amueller
Copy link
Copy Markdown
Member

Another fix from the estimator tags branch. If y.shape == (n_samples, 1) master crashes.

from .validation import check_array



Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My life wasn't complete without reviewing this part of the PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even ?! ;)

clf = DummyClassifier(strategy="constant", random_state=0,
constant=[1])
clf.fit(X, y)
assert_array_equal(clf.predict(X), np.ones((n_samples, 1)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the convention elsewhere is that even when trained on a column vector, predict should return a 1d array:


In [6]: %paste
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
for y in [[0, 1], [[0], [1]]]:
    for clf in [LogisticRegression(), RandomForestClassifier(), SVC(probability=True)]:
        print(clf.__class__.__name__, y)
        clf.fit([[0], [0]], y)
        for m in ['predict', 'predict_proba', 'decision_function']:
            try:
                print(m, getattr(clf, m)([[0], [0]]).shape)
            except AttributeError:
                pass
## -- End pasted text --
LogisticRegression [0, 1]
predict (2,)
predict_proba (2, 2)
decision_function (2,)
RandomForestClassifier [0, 1]
predict (2,)
predict_proba (2, 2)
SVC [0, 1]
predict (2,)
predict_proba (2, 2)
decision_function (2,)
LogisticRegression [[0], [1]]
/Users/joel/repos/scikit-learn/sklearn/utils/validation.py:547: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
predict (2,)
predict_proba (2, 2)
decision_function (2,)
RandomForestClassifier [[0], [1]]
/Users/joel/anaconda3/envs/scipy3k/bin/ipython:7: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  from IPython import start_ipython
predict (2,)
predict_proba (2, 2)
SVC [[0], [1]]
/Users/joel/repos/scikit-learn/sklearn/utils/validation.py:547: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
predict (2,)
predict_proba (2, 2)
decision_function (2,)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I thought for multi-output it was different, but RandomForestClassifier is multi-output. But apparently there's no common test for that? Gah!

@glemaitre
Copy link
Copy Markdown
Member

I'm closing this PR in favor of #20603. It intends to add common test and find a way forward to consistently address the issue.

@glemaitre glemaitre closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants