Skip to content

Warning resolve convergence warning by reducing the default penalty#14143

Merged
NicolasHug merged 6 commits intoscikit-learn:masterfrom
rwaithera:my-feature
Jul 18, 2019
Merged

Warning resolve convergence warning by reducing the default penalty#14143
NicolasHug merged 6 commits intoscikit-learn:masterfrom
rwaithera:my-feature

Conversation

@rwaithera
Copy link
Copy Markdown
Contributor

Added a lower penalty to LinearSVC(). Made C=0.01 instead of the default 1. This resolves the Convergence Warning.

#WiMLDS_Nairobi
@Darthvaderkenya

@adrinjalali
Copy link
Copy Markdown
Member

Please also say something like (related to #14117), when working on the issue :)

Copy link
Copy Markdown
Contributor Author

@rwaithera rwaithera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to #14117

@adrinjalali
Copy link
Copy Markdown
Member

two points which may be useful:

  • Fixes #(number of the issue) will cause the issue to be closed after the PR is merged. Since this only fixes one of the issues in that main issue, it should be said "related to #(issue number)
  • The "fixes" or "closes" magic words work if you put them in the description of the PR, not as a comment :) you can edit the description and add it there.

@adrinjalali
Copy link
Copy Markdown
Member

Could you also paste the output of the example before and after the change?

@rwaithera
Copy link
Copy Markdown
Contributor Author

rwaithera commented Jun 22, 2019

Could you also paste the output of the example before and after the change?

Before
The best_index_ is 3
The n_components selected is 8
The corresponding accuracy score is 0.84
image

After
The best_index_ is 3
The n_components selected is 8
The corresponding accuracy score is 0.85
image

@rwaithera rwaithera closed this Jun 22, 2019
@rwaithera rwaithera reopened this Jun 22, 2019
@adrinjalali
Copy link
Copy Markdown
Member

thanks, the output looks good, could you also say how long it takes to run before and after the change?

@rwaithera
Copy link
Copy Markdown
Contributor Author

thanks, the output looks good, could you also say how long it takes to run before and after the change?

Before, the example took 20.4s to run.
After including a penalty of 0.01 the code run in 6.9s.

Thanks Adrin for your views!

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM, thanks @rwaithera :)

@thomasjpfan
Copy link
Copy Markdown
Member

On master the example output was:

The best_index_ is 2
The n_components selected is 6
The corresponding accuracy score is 0.80

The docstring says:

The balanced case is when n_components=6 and accuracy=0.80,
which falls into the range within 1 standard deviation of the best accuracy
score.

@NicolasHug
Copy link
Copy Markdown
Member

Please try n_components=7 and see if the docstring is respected?

@glemaitre glemaitre self-requested a review July 2, 2019 07:48
@glemaitre
Copy link
Copy Markdown
Member

If it is fine with everyone, I propose the following:

param_grid = {
    'reduce_dim__n_components': [6, 8, 10, 12, 14]
}

It leads to the following figure and we can update the description accordingly. It enforces what is discuss in the description of the example.

Figure_1

@rwaithera can you update the example or to you want me to quickly address those and merge your PR?

@rwaithera
Copy link
Copy Markdown
Contributor Author

Please try n_components=7 and see if the docstring is respected?

I get the following after adding 7 into the reduce_dim__n_components list:

The best_index_ is 3
The n_components selected is 7
The corresponding accuracy score is 0.82

@rwaithera
Copy link
Copy Markdown
Contributor Author

If it is fine with everyone, I propose the following:

param_grid = {
    'reduce_dim__n_components': [6, 8, 10, 12, 14]
}

It leads to the following figure and we can update the description accordingly. It enforces what is discuss in the description of the example.

Figure_1

@rwaithera can you update the example or to you want me to quickly address those and merge your PR?

That increases the accuracy. Nice @adrinjalali .

I will update the cahnges.

@glemaitre
Copy link
Copy Markdown
Member

The important message here is that n_components=10 is good enough because we are in the best_score + 1 std. dev.

@rwaithera
Copy link
Copy Markdown
Contributor Author

The important message here is that n_components=10 is good enough because we are in the best_score + 1 std. dev.

Ah! I see now. It is all clear what everyone meant by within 1 std from the best accuracy score.

Thanks @glemaitre!

@reshamas
Copy link
Copy Markdown
Member

reshamas commented Jul 12, 2019

@rwaithera
Checking in on PR. Do you have to do something for this, or are we waiting on a reviewer?

@adrinjalali
If we are waiting on a reviewer, are you able to find someone? It would be great to have PR merged in by July 22, which is one month post-sprint. Thanks.

cc: @Mariam-ke

@adrinjalali
Copy link
Copy Markdown
Member

I think there are still changes which need to be applied by @rwaithera , and then @glemaitre can accept the PR

@reshamas
Copy link
Copy Markdown
Member

@Mariam-ke
Can you email Ruth (@rwaithera) and ask her to please follow-up? It has been 10 days since the last activity, and after 14 days, the PR can go back in the pool.

To decide whether an inactive PR is stalled, ask the contributor if she/he plans to continue working on the PR in the near future. Failure to respond within 2 weeks with an activity that moves the PR forward suggests that the PR is stalled and will result in tagging that PR with “help wanted”.

As an FYI: there are sprints going on in Austin, TX this weekend, and then the NYC sprint coming up in August, so this can go to someone else.

Thanks.

@NicolasHug
Copy link
Copy Markdown
Member

Marking this available for the Austin sprint (hope that's OK. I think the 14 days rule is too strict when there are upcoming sprints).

As far as I can tell all this PR needs is to update the docstring description above

@reshamas
Copy link
Copy Markdown
Member

@NicolasHug That is fine. Would be good to see this completed.

@wendyhhu
Copy link
Copy Markdown
Contributor

Hi folks, I'm working on this issue.

@rwaithera
Copy link
Copy Markdown
Contributor Author

@rwaithera
Checking in on PR. Do you have to do something for this, or are we waiting on a reviewer?

@adrinjalali
If we are waiting on a reviewer, are you able to find someone? It would be great to have PR merged in by July 22, which is one month post-sprint. Thanks.

cc: @Mariam-ke

@reshamas I submitted the suggestion @glemaitre gave. Please @glemaitre , review the changes.

Thanks!

@NicolasHug
Copy link
Copy Markdown
Member

NicolasHug commented Jul 13, 2019

@rwaithera the docstring at the beginning of the example needs an update to be consistent with the current changes (see the changes proposed in #14329 by @wendyhhu).

Would you be able to submit them soon? Thanks!

@rwaithera
Copy link
Copy Markdown
Contributor Author

@rwaithera the docstring at the beginning of the example needs an update to be consistent with the current changes (see the changes proposed in #14329 by @wendyhhu).

Would you be able to submit them soon? Thanks!

@NicolasHug I see the changes were updated in the documentation.

@NicolasHug
Copy link
Copy Markdown
Member

@rwaithera, what I mean is that we need you to apply the same changes as in #14329 for your PR to get merged.

@rwaithera
Copy link
Copy Markdown
Contributor Author

@rwaithera, what I mean is that we need you to apply the same changes as in #14329 for your PR to get merged.

I have applied the changes @NicolasHug


The figure shows the trade-off between cross-validated score and the number
of PCA components. The balanced case is when n_components=6 and accuracy=0.80,
of PCA components. The balanced case is when n_components=12 and accuracy=0.90,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwaithera thanks, we're almost there. Just need to update the numbers:

Suggested change
of PCA components. The balanced case is when n_components=12 and accuracy=0.90,
of PCA components. The balanced case is when n_components=10 and accuracy=0.88,

@adrinjalali
Copy link
Copy Markdown
Member

@rwaithera some final changes you need to make, should be then ready for a merge :)

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work @rwaithera

@NicolasHug NicolasHug merged commit b4f2bf4 into scikit-learn:master Jul 18, 2019
@rwaithera
Copy link
Copy Markdown
Contributor Author

Thank you for your work @rwaithera

Thank you for your guidance @NicolasHug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants