[MRG] Feature: calculate normed stress (Stress-1) in sklearn.manifold.MDS#13042
[MRG] Feature: calculate normed stress (Stress-1) in sklearn.manifold.MDS#13042matthieu-pa wants to merge 12 commits intoscikit-learn:mainfrom
Conversation
|
Is there anything still missing for the merge? |
|
I need to rebase on the latest version of sklearn. If anything else is
needed, please let me know
…On Sun, Apr 19, 2020, 23:57 Antonio Escobar ***@***.***> wrote:
Is there anything still missing for the merge?
The Stress-1 feature is actually quite fundamental to understand if the
fit is meaningless or not.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13042 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGG7Z5BCQZR7Z7ZH43ZOMFLRNMGOTANCNFSM4GSBPCHA>
.
|
|
I merged the latest master version into this branch and solved the merge conflict. |
Is it required to compute the Stress-1 in every iteration, or can it be just done for the final (returned) Stress value? Maybe it can be just a new returned value, instead of a new option. Keeping stress and adding stress_one or stress_normalized |
This is a very good question. Does norming in every iteration affect the result too? |
The stop condition (eps) is checked using the normalized stress, so it might stop prematurely and perform less iterations, since the epsilon in the normalized stress is comparatively smaller. Not a big deal, one could just decrease the eps if using the normalized option, but I think it can anyway be more efficient doing the normalization just at the end. |
Thank you for raising a very good point. I also agree that checking the normalized stress at every iteration is not very likely to cause MDS to stop early whereas it is quite more computing intensive. Next week, to be thorough, I could benchmark a version calculating the normalized stress at every iteration and one just at the end over a few randomly generated distance matrices. I would mainly focus on comparing execution time and the number of iterations required to converge. |
|
Closing as superseded by #22562. |
Reference Issues/PRs
Fixes #10168 #12285
What does this implement/fix? Explain your changes.
This is a follow-up on the stale PRs referenced above, the main diff is the fix for the previously failing unit test:
https://travis-ci.org/scikit-learn/scikit-learn/jobs/437566342#L2818
To my understanding, even using normalized stress,
smacof()needs to be initialized at same configuration for the propertyNormed stress should be the same for values multiplied by some factor "k"to be true so I setrandom_stateofsmacof()to a fixed value. Dissimilarity matrix also needs to be large enough.Any other comments?
The previous reviewer was @glemaitre . To my understanding review comments have been addressed but if something is missing, I'll do my best to fix it.