Skip to content

[MRG] Raise descriptive ValueError if number of samples equals number of classes in Linear Discriminant Analysis#12391

Merged
jnothman merged 5 commits intoscikit-learn:masterfrom
eddbrown:master
Oct 23, 2018
Merged

[MRG] Raise descriptive ValueError if number of samples equals number of classes in Linear Discriminant Analysis#12391
jnothman merged 5 commits intoscikit-learn:masterfrom
eddbrown:master

Conversation

@eddbrown
Copy link
Copy Markdown
Contributor

@eddbrown eddbrown commented Oct 15, 2018

Fixes #12374

Before: Failed with division by zero error if no_classes == no_sample in Linear Discriminant Analysis.

After: This PR raises a more descriptive ValueError, explaining the problem.

@eddbrown eddbrown changed the title Fixes #12374 [MRG] Fixes #12374 Oct 15, 2018
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Oct 16, 2018 via email

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Oct 16, 2018 via email

@adrinjalali
Copy link
Copy Markdown
Member

Could you please also add a test? You also have a PEP8 issue, hence travis failing. flake8 /path/to/file.py would tell you where to fix as well.

@eddbrown eddbrown changed the title [MRG] Fixes #12374 [MRG] Raise descriptive ValueError if number of sample equals number of classes in Linear Discriminant Analysis Oct 16, 2018
@eddbrown eddbrown changed the title [MRG] Raise descriptive ValueError if number of sample equals number of classes in Linear Discriminant Analysis [MRG] Raise descriptive ValueError if number of samples equals number of classes in Linear Discriminant Analysis Oct 16, 2018
@eddbrown eddbrown force-pushed the master branch 2 times, most recently from 6a09189 to 3c1f7a1 Compare October 16, 2018 20:32
X = np.zeros((no_classes, no_classes))
y = np.zeros((no_classes, 1))
clf = LinearDiscriminantAnalysis(solver="svd")
assert_raises(ValueError, clf.fit, X, y)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_raises(ValueError, clf.fit, X, y)
with pytest.raises(ValueError, match="The number of samples must be more"):
clf.fit(X, y)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice, tests the message too to make sure that its actually THAT ValueError being raised

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and since the adoption of pytest, pytest.... are generally preferred to custom assert_... ones.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your advice!

@eddbrown eddbrown force-pushed the master branch 3 times, most recently from 706a3df to d0f62c8 Compare October 16, 2018 22:09
@jnothman
Copy link
Copy Markdown
Member

Tests are failing

@jnothman
Copy link
Copy Markdown
Member

Please append commits rather than amending and force-pushing. It makes it hard for us to review when we aren't notified that a commit has been added and we can't see the diff

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since classes_ is set as unique values in y, it seems a little strange to have the message state the number of samples and classes when they must be equal

"Currently, you have {} samples and {} "
"classes.".format(n_samples, n_classes))


Copy link
Copy Markdown
Member

@adrinjalali adrinjalali Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have only one empty new line. Why wouldn't flake8 not fail on this on travis?

@jnothman
Copy link
Copy Markdown
Member

This case should also be handled for the other solvers. Currently eigen raises an error, while lsqr gives a warning.

Edward Brown added 2 commits October 17, 2018 23:23
The lsqr and eigen solvers also will fail if the number of samples equals the number of classes, so this commit moves the error raise to cover all cases.
@eddbrown
Copy link
Copy Markdown
Contributor Author

@jnothman @adrinjalali thanks for your patience. How is it looking?

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. I don't know if it's helpful to users to put this in our change log...?

assert_almost_equal(c_s, c_s.T)


def test_raises_value_error_on_same_number_of_classes_and_samples():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pytest.mark.parameterize to test this for each solver I suppose

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eddbrown , LGTM.

@eddbrown
Copy link
Copy Markdown
Contributor Author

@adrinjalali what is the next step with getting this merged then? I think I've misunderstood the process a bit...

@adrinjalali
Copy link
Copy Markdown
Member

@eddbrown an unwritten part of the process is the casual (read usual, normal) patience required for a core developer to do the next action ;) let it be a review, response to a question, merge, etc.

Your PR now is in a state to be merged, and I guess it will be soon (I'm not somebody with permissions to do a merge). Now you can just sit back and relax until it happens, or go on and find another issue to investigate ;)

@jnothman
Copy link
Copy Markdown
Member

an unwritten part of the process is the casual (read usual, normal) patience

I'm pretty sure I mentioned patience somewhere in the contributor docs in the last couple of years... ?

Thanks @eddbrown

@jnothman jnothman merged commit b498ac7 into scikit-learn:master Oct 23, 2018
thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018
@amueller amueller mentioned this pull request Nov 20, 2018
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 20, 2018
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…number of classes in Linear Discriminant Analysis (scikit-learn#12391)"

This reverts commit 6cae448.
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…number of classes in Linear Discriminant Analysis (scikit-learn#12391)"

This reverts commit 6cae448.
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LinearDiscriminantAnalysis.fit() fails if #samples == #labels.

4 participants