Skip to content

[MRG] Improve performance of the plot_rbm_logistic_classification.py example (#13383)#13648

Merged
thomasjpfan merged 3 commits intoscikit-learn:masterfrom
Framartin:iss13383
Apr 17, 2019
Merged

[MRG] Improve performance of the plot_rbm_logistic_classification.py example (#13383)#13648
thomasjpfan merged 3 commits intoscikit-learn:masterfrom
Framartin:iss13383

Conversation

@Framartin
Copy link
Copy Markdown
Contributor

@Framartin Framartin commented Apr 15, 2019

Reference Issues/PRs

Partial fix to #13383

What does this implement/fix? Explain your changes.

The example examples/neural_networks/plot_rbm_logistic_classification.py mentioned in #13383 takes a total of 32.5 seconds to build in html (on my computer). By replacing the solver used in the logistic regression from lbfgs to newton-cg, the example took 22.0 seconds to build.

The LogisticRegression is trained twice, once on RBM features in a pipeline and once on raw pixel features, but only the first one is time-consuming because of a slow convergence. lbfgs took 4474 iterations to converge, whereas newton-cg converges in 43 iterations.

The performance metrics are exactly similar for the logistic regression using RBM features (at the reported precision). Only very slight differences occur in metrics of the logistic regression trained on raw pixel features (highlighted below).

Solver: newton-cg
Logistic regression using RBM features:
              precision    recall  f1-score   support

           0       0.98      0.98      0.98       174
           1       0.93      0.92      0.93       184
           2       0.93      0.97      0.95       166
           3       0.95      0.92      0.93       194
           4       0.95      0.94      0.95       186
           5       0.93      0.94      0.93       181
           6       0.98      0.97      0.97       207
           7       0.93      0.98      0.95       154
           8       0.92      0.88      0.90       182
           9       0.91      0.93      0.92       169

    accuracy                           0.94      1797
   macro avg       0.94      0.94      0.94      1797
weighted avg       0.94      0.94      0.94      1797

Solver: lbfgs
Logistic regression using RBM features:
              precision    recall  f1-score   support

           0       0.98      0.98      0.98       174
           1       0.93      0.92      0.93       184
           2       0.93      0.97      0.95       166
           3       0.95      0.92      0.93       194
           4       0.95      0.94      0.95       186
           5       0.93      0.94      0.93       181
           6       0.98      0.97      0.97       207
           7       0.93      0.98      0.95       154
           8       0.92      0.88      0.90       182
           9       0.91      0.93      0.92       169

    accuracy                           0.94      1797
   macro avg       0.94      0.94      0.94      1797
weighted avg       0.94      0.94      0.94      1797
Solver: newton-cg
Logistic regression using raw pixel features:
              precision    recall  f1-score   support

           0       0.90      0.93      0.91       174
           1       0.60      0.59      0.59       184
           2       0.75      0.85      0.80       166
           3      *0.77*     0.79      0.78       194
           4       0.81      0.84      0.82       186
           5       0.77      0.75      0.76       181
           6       0.90      0.87      0.89       207
           7       0.86      0.88      0.87       154
           8       0.67     *0.58*   *0.62*       182
           9       0.75      0.76      0.76       169

    accuracy                           0.78      1797
   macro avg       0.78      0.78      0.78      1797
weighted avg       0.78      0.78      0.78      1797

Solver: lbfgs
Logistic regression using raw pixel features:
              precision    recall  f1-score   support

           0       0.90      0.93      0.91       174
           1       0.60      0.59      0.59       184
           2       0.75      0.85      0.80       166
           3      *0.78*     0.79      0.78       194
           4       0.81      0.84      0.82       186
           5       0.77      0.75      0.76       181
           6       0.90      0.87      0.89       207
           7       0.86      0.88      0.87       154
           8       0.67     *0.59*    *0.63*     182
           9       0.75      0.76      0.76       169

    accuracy                           0.78      1797
   macro avg       0.78      0.78      0.78      1797
weighted avg       0.78      0.78      0.78      1797

Any other comments?

newton-cg was the quicker compatible solver to trained the logistic regression on RBM features. Below are the times it took to train it:

  • newton-cg: 00:14.096810
  • lbfgs: 00:22.904742
  • sag: 01:38.606006
  • saga: 02:13.611261

I welcome any other idea that may save more time on this example.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to merge this and consider other changes separately...

@Framartin
Copy link
Copy Markdown
Contributor Author

Thanks @jnothman! I didn't want to spam the PR list, but I agree that it will be clearer to have one PR for each time-consuming example. I'm updating the title of this PR.

@Framartin Framartin changed the title [WIP] Simplify some time-comsuming examples (#13383) [MRG] Simplify some time-comsuming examples (#13383) Apr 16, 2019
@Framartin Framartin changed the title [MRG] Simplify some time-comsuming examples (#13383) [MRG] Improve performance of the plot_rbm_logistic_classification.py example (#13383) Apr 16, 2019
@thomasjpfan
Copy link
Copy Markdown
Member

thomasjpfan commented Apr 16, 2019

Out of curiosity, what are your results for the following parameters?

logistic = linear_model.LogisticRegression(solver='lbfgs', tol=1,
                                           max_iter=1000,
                                           multi_class='multinomial')

@Framartin
Copy link
Copy Markdown
Contributor Author

Thanks a lot @thomasjpfan for your very good point.

Using lbfgs with a unit tolerance, we have a much quicker convergence in 568 iterations in 00:05.259457 to train the Pipeline, corresponding to a total time of 9.844 seconds to build the example in html.

Using newton-cg with unit tolerance too, we can reduce even more the total time to 8.711 seconds to build the html.

Metrics computed on the whole test set stay the same between the 3 sets of arguments. But there are some slight differences at the class level. solver='lbfgs', tol=1 produces results a bit closer to solver='lbfgs', tol=1e-4, than solver='newton-cg', tol=1 to solver='lbfgs', tol=1e-4. However, the maximum difference corresponds to the recall of the digit 3 class where it drops from 0.92 to 0.88. IMHO, this is acceptable.

Original example:

Arg: solver='lbfgs', tol=1e-4 (default)

Logistic regression using RBM features:
              precision    recall  f1-score   support

           0       0.98      0.98      0.98       174
           1       0.93      0.92      0.93       184
           2       0.93      0.97      0.95       166
           3       0.95      0.92      0.93       194
           4       0.95      0.94      0.95       186
           5       0.93      0.94      0.93       181
           6       0.98      0.97      0.97       207
           7       0.93      0.98      0.95       154
           8       0.92      0.88      0.90       182
           9       0.91      0.93      0.92       169

    accuracy                           0.94      1797
   macro avg       0.94      0.94      0.94      1797
weighted avg       0.94      0.94      0.94      1797


Logistic regression using raw pixel features:
              precision    recall  f1-score   support

           0       0.90      0.93      0.91       174
           1       0.60      0.59      0.59       184
           2       0.75      0.85      0.80       166
           3       0.78      0.79      0.78       194
           4       0.81      0.84      0.82       186
           5       0.77      0.75      0.76       181
           6       0.90      0.87      0.89       207
           7       0.86      0.88      0.87       154
           8       0.67      0.59      0.63       182
           9       0.75      0.76      0.76       169

    accuracy                           0.78      1797
   macro avg       0.78      0.78      0.78      1797
weighted avg       0.78      0.78      0.78      1797

Your suggestion:

Arg: solver='lbfgs', tol=1

Logistic regression using RBM features:
              precision    recall  f1-score   support

           0       0.98      0.98      0.98       174
           1       0.90      0.92      0.91       184
           2       0.93      0.95      0.94       166
           3       0.95      0.92      0.93       194
           4       0.96      0.95      0.95       186
           5       0.94      0.91      0.92       181
           6       0.99      0.97      0.98       207
           7       0.91      0.99      0.95       154
           8       0.90      0.87      0.88       182
           9       0.91      0.93      0.92       169

    accuracy                           0.94      1797
   macro avg       0.94      0.94      0.94      1797
weighted avg       0.94      0.94      0.94      1797


Logistic regression using raw pixel features:
              precision    recall  f1-score   support

           0       0.91      0.92      0.91       174
           1       0.59      0.58      0.59       184
           2       0.75      0.85      0.80       166
           3       0.78      0.79      0.78       194
           4       0.81      0.83      0.82       186
           5       0.76      0.75      0.76       181
           6       0.90      0.87      0.89       207
           7       0.86      0.88      0.87       154
           8       0.67      0.59      0.63       182
           9       0.75      0.76      0.75       169

    accuracy                           0.78      1797
   macro avg       0.78      0.78      0.78      1797
weighted avg       0.78      0.78      0.78      1797

My suggestion:

Arg: solver='newton-cg', tol=1

Logistic regression using RBM features:
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       174
           1       0.90      0.91      0.91       184
           2       0.91      0.95      0.93       166
           3       0.97      0.88      0.92       194
           4       0.97      0.93      0.95       186
           5       0.93      0.93      0.93       181
           6       0.98      0.97      0.98       207
           7       0.92      0.99      0.95       154
           8       0.90      0.90      0.90       182
           9       0.91      0.93      0.92       169

    accuracy                           0.94      1797
   macro avg       0.94      0.94      0.94      1797
weighted avg       0.94      0.94      0.94      1797


Logistic regression using raw pixel features:
              precision    recall  f1-score   support

           0       0.90      0.93      0.91       174
           1       0.61      0.58      0.59       184
           2       0.75      0.85      0.80       166
           3       0.78      0.78      0.78       194
           4       0.81      0.84      0.83       186
           5       0.76      0.77      0.76       181
           6       0.91      0.87      0.89       207
           7       0.86      0.88      0.87       154
           8       0.67      0.58      0.62       182
           9       0.75      0.76      0.75       169

    accuracy                           0.78      1797
   macro avg       0.78      0.78      0.78      1797
weighted avg       0.78      0.78      0.78      1797

Do you prefer to stick to lbfgs? Should we add a comment to warn users about their choice of tol?

@Framartin
Copy link
Copy Markdown
Contributor Author

Framartin commented Apr 16, 2019

b94d71d sets tol = 1 keeping solver = newton-cg. But I can change it if you prefer.

@thomasjpfan
Copy link
Copy Markdown
Member

Should we add a comment to warn users about their choice of tol?

Changing the tol depending on solver is actively being discussed :)

sets tol = 1 keeping solver = newton-cg

I am okay with this.

@thomasjpfan thomasjpfan merged commit 934dfb0 into scikit-learn:master Apr 17, 2019
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants