[MRG+1] Improve benchmark on NMF#5779
Conversation
benchmarks/bench_plot_nmf.py
Outdated
ogrisel
left a comment
There was a problem hiding this comment.
This need a rebase + see my comments, both other than that +1 for merge.
benchmarks/bench_plot_nmf.py
Outdated
| report(norm(X - np.dot(W, H)), tend) | ||
|
|
||
| return timeset, err | ||
| @ignore_warnings |
There was a problem hiding this comment.
Please add a comment to explain which kind of warnings do you expect to ignore here.
There was a problem hiding this comment.
It was a panda's indexing warning, I solved it and removed the ignore_warnings.
|
|
||
|
|
||
| # use joblib to cache results. | ||
| # X_shape is specified in arguments for avoiding hashing X |
There was a problem hiding this comment.
Out of curiosity, how much overhead does the hashing of X adds?
There was a problem hiding this comment.
A few seconds on each dataset, i.e. for ~200 calls to bench_one.
If I recall correctly, I added it mostly for larger datasets like MNIST or RCV1.
57cdf03 to
38561e1
Compare
|
Rebased and comments addressed. |
Question: should we keep the projected-gradient solver in this benchmark, once it is removed from the package (in 0.19) ? |
|
Depends on how much trouble it is to keep them around, I'd say. It's informative to have it in the benchmark. |
|
lgtm |
|
Well, I think you should drop ProjectedGradient now that we're in 0.19; and ideally we'd get #5295 merged, but otherwise this does have +2 already. |
|
@TomDLT Can you rebase and drop ProjectedGradient ? |
|
Soon 😸 |
|
Update:
|
tguillemot
left a comment
There was a problem hiding this comment.
I can't tell you about _PGNMF.
LGTM anyway
|
|
||
| ################### | ||
| # Start of _PGNMF # | ||
| ################### |
There was a problem hiding this comment.
If you keep it maybe you can add a comment to say where comes from that codes and when it was removed of sklearn ?
|
Keep it there I think.
Sent from phone. Please excuse spelling and brevity.
…On Dec 19, 2016 5:34 AM, "Thierry Guillemot" ***@***.***> wrote:
***@***.**** approved this pull request.
I can't tell you about _PGNMF.
LGTM anyway
------------------------------
In benchmarks/bench_plot_nmf.py
<#5779 (review)>
:
>
+###################
+# Start of _PGNMF #
+###################
If you keep it maybe you can add a comment to say where comes from that
codes et when it was removed of sklearn ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5779 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbcFi_kvF7jg94u16pUUu2EaN1DrIEFks5rJl2ggaJpZM4GfTpQ>
.
|
|
Thanks for the clarity @TomDLT ! |
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
Previous benchmark used simulated data, did not use the new coordinate descent solver (#4852), and I found the plot very uninformative:


This new benchmark tests NMF on two datasets:
It uses three different solvers:
and three different initialization schemes:
The total running time is about 2 minutes for 20 Newsgroups dataset, and 1 minute for Olivetti faces dataset.
On the plots, each point corresponds to one more iteration.


This change is