ENH Change the default `n_init` and `eps` for MDS by dkobak · Pull Request #31117 · scikit-learn/scikit-learn

dkobak · 2025-03-31T22:35:47Z

This is a follow-up to #30514 and has been discussed in there to some extent. It fixes two issues:

Current default in MDS is n_init=4, which runs MDS four times. Other sklearn classes that offer this functionality use n_init=1 by default, e.g. sklearn.mixture.GaussianMixture. This appears much more sensible to me. So I change the default to n_init=1.
The convergence criterion was really strange and unmotivated, and the default eps led to really bad underconvergence on some simple datasets. I am changing it to the convergence criterion that (i) roughly follows the R implementation, that (ii) makes sense for both metric and non-metric MDS, and that (iii) is not affected by any rescaling of the input matrix X. The new convergence criterion is ((old_stress - stress) / ((distances.ravel() ** 2).sum() / 2)) < eps and the default eps=1e-6 as in the R implementation.

Apart from that, I fixed the formula for the "normalized stress" aka "stress-1" (as discussed in the previous PR), and added several tests.

I implemented FutureWarnings until v1.9 and corresponding tests.

Here is the result of running this code with the new default parameters on a small subset of Digits dataset.

import pylab as plt
from sklearn.manifold import MDS
from sklearn.datasets import load_digits
import numpy as np

X, y = load_digits(return_X_y=True)

rng = np.random.default_rng(seed=42)
ind = rng.choice(len(X), replace=False, size=200)

mds1 = MDS(random_state=42, metric=True, normalized_stress=True, n_init=1, eps=1e-6)
Z1 = mds1.fit_transform(X[ind])
mds2 = MDS(random_state=42, metric=False, normalized_stress=True, n_init=1, eps=1e-6)
Z2 = mds2.fit_transform(X[ind])

plt.figure(figsize=(8, 4), layout="constrained")
plt.subplot(121)
plt.scatter(Z1[:,0], Z1[:,1], c=y[ind], s=3, cmap="tab10")
plt.title(f"metric MDS\nnorm. stress = {mds1.stress_:.2f}, n_iter = {mds1.n_iter_}")
plt.subplot(122)
plt.scatter(Z2[:,0], Z2[:,1], c=y[ind], s=3, cmap="tab10")
plt.title(f"non-metric MDS\nnorm. stress = {mds2.stress_:.2f}, n_iter = {mds2.n_iter_}")
plt.suptitle("Digits dataset, n=200 subset")
plt.savefig("mds2.png", facecolor="w", dpi=200)

Note that both embeddings converge within ~200 iterations, and that non-metric MDS has lower normalized stress than metric MDS, as expected.

Running this with current default (removing n_init=1, eps=1e-6 from the MDS calls) produces awful results, as the convergence criterion hits way too early:

Almost the same thing happens on main. So in my opinion the current eps value is dysfunctional, especially for non-metric MDS, and the current n_init value is a waste of computations.

github-actions · 2025-03-31T22:37:07Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 753913c. Link to the linter CI: here}

dkobak · 2025-04-09T11:22:06Z

@antoinebaker IMHO it would be great to have this in 1.7 so that the deprecation cycle rolls out by 1.9... Would be amazing if you could take a look. Thanks!

antoinebaker · 2025-04-09T14:59:55Z

Will try ! But not sure we can make it to the 1.7 release, which is coming very soon :)

antoinebaker

Thanks a lot @dkobak for the follow-up PR !

The stopping criterion makes much more sense now. The plot_mds.py example is improved with the tutorial like presentation.

Even if our goal is not to reproduce the R code exactly, could you provide a comparison of the R / sklearn results on some examples to see if the found stress are in the same ballpark ? I guess comparing the embeddings is more difficult because of rotational invariance.

Below a few suggestions to ease maintenance.

sklearn/manifold/_mds.py

examples/manifold/plot_mds.py

dkobak · 2025-04-22T09:17:58Z

@antoinebaker Thanks for the review, I added all your suggestions.

Even if our goal is not to reproduce the R code exactly, could you provide a comparison of the R / sklearn results on some examples to see if the found stress are in the same ballpark ?

This makes sense, however I am not a R user and so cannot do this very easily... If you think this is important, I can try to set it up.

dkobak · 2025-04-22T09:46:49Z

Okay, I am trying to set up the comparison. I'm taking the data from here: https://cran.r-project.org/web/packages/smacof/vignettes/mdsnutshell.html

S = np.array([[0.  , 0.86, 0.42, 0.42, 0.18, 0.06, 0.07, 0.04, 0.02, 0.07, 0.09,
        0.12, 0.13, 0.16],
       [0.86, 0.  , 0.5 , 0.44, 0.22, 0.09, 0.07, 0.07, 0.02, 0.04, 0.07,
        0.11, 0.13, 0.14],
       [0.42, 0.5 , 0.  , 0.81, 0.47, 0.17, 0.1 , 0.08, 0.02, 0.01, 0.02,
        0.01, 0.05, 0.03],
       [0.42, 0.44, 0.81, 0.  , 0.54, 0.25, 0.1 , 0.09, 0.02, 0.01, 0.  ,
        0.01, 0.02, 0.04],
       [0.18, 0.22, 0.47, 0.54, 0.  , 0.61, 0.31, 0.26, 0.07, 0.02, 0.02,
        0.01, 0.02, 0.  ],
       [0.06, 0.09, 0.17, 0.25, 0.61, 0.  , 0.62, 0.45, 0.14, 0.08, 0.02,
        0.02, 0.02, 0.01],
       [0.07, 0.07, 0.1 , 0.1 , 0.31, 0.62, 0.  , 0.73, 0.22, 0.14, 0.05,
        0.02, 0.02, 0.  ],
       [0.04, 0.07, 0.08, 0.09, 0.26, 0.45, 0.73, 0.  , 0.33, 0.19, 0.04,
        0.03, 0.02, 0.02],
       [0.02, 0.02, 0.02, 0.02, 0.07, 0.14, 0.22, 0.33, 0.  , 0.58, 0.37,
        0.27, 0.2 , 0.23],
       [0.07, 0.04, 0.01, 0.01, 0.02, 0.08, 0.14, 0.19, 0.58, 0.  , 0.74,
        0.5 , 0.41, 0.28],
       [0.09, 0.07, 0.02, 0.  , 0.02, 0.02, 0.05, 0.04, 0.37, 0.74, 0.  ,
        0.76, 0.62, 0.55],
       [0.12, 0.11, 0.01, 0.01, 0.01, 0.02, 0.02, 0.03, 0.27, 0.5 , 0.76,
        0.  , 0.85, 0.68],
       [0.13, 0.13, 0.05, 0.02, 0.02, 0.02, 0.02, 0.02, 0.2 , 0.41, 0.62,
        0.85, 0.  , 0.76],
       [0.16, 0.14, 0.03, 0.04, 0.  , 0.01, 0.  , 0.02, 0.23, 0.28, 0.55,
        0.68, 0.76, 0.  ]])
D = 1 - S
np.fill_diagonal(D, 0)

I'm running R here: https://webr.r-wasm.org/latest/

library(smacof)
D = 1 - ekman
mds(D)
mds(D, type="ordinal")

Results:

Stress-1 value: 0.131 
Stress-1 value: 0.023

In Python (this branch):

np.random.seed(42)
MDS(dissimilarity="precomputed", metric=True, normalized_stress=True, n_init=1).fit(D).stress_ # 0.137
MDS(dissimilarity="precomputed", metric=True, normalized_stress=True, n_init=1, eps=1e-6).fit(D).stress_ # 0.132
MDS(dissimilarity="precomputed", metric=False, normalized_stress=True, n_init=1).fit(D).stress_ # 0.372
MDS(dissimilarity="precomputed", metric=False, normalized_stress=True, n_init=1, eps=1e-6).fit(D).stress_ # 0.031

I think it fits well enough! Clearly the new eps value is much more suitable for non-metric MDS.

Note: I cannot get down to 0.023 stress-1 with the Python implementation, and I am not sure why, but this is out of scope of this PR. For reference, here is the R code: https://github.com/cran/smacof/blob/master/R/smacofSym.R.

dkobak · 2025-04-23T13:23:36Z

@antoinebaker I fixed the issue in my comparison code (had to add np.fill_diagonal(D, 0) in Python), I think it now works well enough, see my updated comment above. Let me know what you think.

antoinebaker · 2025-04-24T08:21:25Z

@antoinebaker I fixed the issue in my comparison code (had to add np.fill_diagonal(D, 0) in Python), I think it now works well enough, see my updated comment above. Let me know what you think.

Thanks for comparison examples! To share and reproduce, could you provide them as gists for the python and R code ? The python one can be a notebook, the R code just a plain script with the results commented as you have done. Another possibility is to have a small public repository with the two code snippets.

antoinebaker · 2025-04-24T08:27:51Z

For the stress_ which is a bit worse for the sklearn implementation, I agree that we could investigate further in a follow up PR / issue, and test more extensively the R smacof / sklearn MDS comparison.

This PR is already a much needed improvement for MDS, so +1 to merge it as is.

dkobak · 2025-04-24T08:36:04Z

Here is the gist, I put the R code into the comments on top: https://gist.github.com/dkobak/3daf73495b3da3b23dfbfd6b0b441030

dkobak · 2025-04-24T10:24:08Z

Any chance to get it into 1.7? I am not sure what the timeline for that is.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

dkobak · 2025-04-25T14:32:48Z

Thanks @ogrisel. I have edited the PR to make the change of eps effective immediately, and only use FutureWarning for n_iter. CC @antoinebaker.

ogrisel

Thanks very much for the quick follow-up. Here is a final comment but otherwise LGTM.

sklearn/manifold/_mds.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

antoinebaker

A few nitpicks (mainly removing superfluous eps=1e-6 as it is the new default), otherwise LGTM !

sklearn/manifold/_mds.py

examples/manifold/plot_compare_methods.py

examples/manifold/plot_manifold_sphere.py

sklearn/manifold/_mds.py

sklearn/manifold/tests/test_mds.py

sklearn/tests/test_docstring_parameters.py

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

ogrisel · 2025-04-29T09:23:05Z

I enabled auto-merge so that the PR will be merged if CI is green after this commit.

@dkobak instead of committing individual review suggestions, you can press the "add suggestion to batch" from the diff view of the PR and then do a single commit for a group of suggestions to avoid putting unnecessary CI usage.

dkobak · 2025-04-29T12:49:57Z

Thanks. I noticed that the what's-new entries had wrong markup formatting (mea culpa) and also found many other existing what's-new entries for 1.7 that have wrong markup and render incorrectly. I fixed all of them in #31272.

dkobak added 2 commits March 31, 2025 23:56

Change the init and the eps

ce26812

Fix typo

2bfe661

github-actions bot added the module:manifold label Mar 31, 2025

Add what's new

f900c29

dkobak mentioned this pull request Mar 31, 2025

FIX Fix multiple severe bugs in non-metric MDS #30514

Merged

dkobak added 9 commits April 1, 2025 00:42

rename what's new file

9aa9af7

Fix the example

dfb5b43

Add deprecation cycle

45dd51c

Fix example

153febe

Remove future warnings from examples

80862ab

Fix docstring params

95c99bb

Fix docstrings

f56d589

Fix warning test

27bf6cc

Avoid future warning in docsting params test

bd51858

dkobak changed the title ~~ENH Improve the convergence criterion and the default n_init for MDS~~ ENH Change the default n_init and eps for MDS Apr 2, 2025

dkobak added 2 commits April 2, 2025 13:52

Adjust what's new description

f394ab2

Format the MDS example as a notebook

d5346bc

antoinebaker approved these changes Apr 17, 2025

View reviewed changes

sklearn/manifold/_mds.py Outdated Show resolved Hide resolved

sklearn/manifold/_mds.py Outdated Show resolved Hide resolved

sklearn/manifold/_mds.py Outdated Show resolved Hide resolved

sklearn/manifold/_mds.py Show resolved Hide resolved

examples/manifold/plot_mds.py Show resolved Hide resolved

Add some TODO comments

cae3bc0

dkobak mentioned this pull request Apr 23, 2025

ENH Add eigh as a solver in MDS #22330

Closed

dkobak and others added 6 commits April 25, 2025 15:33

Update sklearn/manifold/_mds.py

9a5b287

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Update sklearn/manifold/_mds.py

8253531

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Update sklearn/manifold/_mds.py

5af02c3

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Change eps default without future warning

3dae414

Fix changelog

fe23faa

Merge branch 'main' into mds-default-params

9f2ef99

ogrisel approved these changes Apr 28, 2025

View reviewed changes

sklearn/manifold/_mds.py Outdated Show resolved Hide resolved

dkobak and others added 3 commits April 28, 2025 15:52

Update sklearn/manifold/_mds.py

20b7c10

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Update sklearn/manifold/_mds.py

522800e

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Merge branch 'main' into mds-default-params

186314d

antoinebaker approved these changes Apr 29, 2025

View reviewed changes

dkobak and others added 10 commits April 29, 2025 10:40

Update sklearn/manifold/_mds.py

11975bb

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Update examples/manifold/plot_compare_methods.py

6e8e4ec

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Update examples/manifold/plot_manifold_sphere.py

ad5e516

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Update examples/manifold/plot_manifold_sphere.py

0a9720d

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Apply suggestions from code review

97a091c

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Apply suggestions from code review

b9d6ee3

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Apply suggestions from code review

7fbdeec

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Apply suggestions from code review

6c49439

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Merge branch 'main' into mds-default-params

878c6e7

Fix a but introduced in code review

753913c

ogrisel enabled auto-merge (squash) April 29, 2025 09:20

ogrisel merged commit 31439d2 into scikit-learn:main Apr 29, 2025
36 checks passed

dkobak deleted the mds-default-params branch April 29, 2025 12:14

markotoplak mentioned this pull request Jun 9, 2025

Relax test tolerances (for scikit-learn 1.7.0's stopping criterium) biolab/orange3#7101

Merged

2 tasks

This was referenced Mar 22, 2026

MNT: change default n_init to 1 in MDS and smacof as planned for 1.9 #33595

Merged

MNT: Implement TODO(1.9) - change default n_init to 1 in MDS and smacof #33598

Closed

Uh oh!

Conversation

dkobak commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

dkobak commented Apr 9, 2025

Uh oh!

antoinebaker commented Apr 9, 2025

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkobak commented Apr 22, 2025

Uh oh!

dkobak commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkobak commented Apr 23, 2025

Uh oh!

antoinebaker commented Apr 24, 2025

Uh oh!

antoinebaker commented Apr 24, 2025

Uh oh!

dkobak commented Apr 24, 2025

Uh oh!

dkobak commented Apr 24, 2025

Uh oh!

dkobak commented Apr 25, 2025

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 29, 2025

Uh oh!

Uh oh!

dkobak commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dkobak commented Mar 31, 2025 •

edited

Loading

github-actions bot commented Mar 31, 2025 •

edited

Loading

dkobak commented Apr 22, 2025 •

edited

Loading