Skip to content

MNT accelerate example plot_lle_digits.py#21736

Merged
jjerphan merged 4 commits intoscikit-learn:mainfrom
siavrez:accelerate_examples3
Nov 24, 2021
Merged

MNT accelerate example plot_lle_digits.py#21736
jjerphan merged 4 commits intoscikit-learn:mainfrom
siavrez:accelerate_examples3

Conversation

@siavrez
Copy link
Copy Markdown
Contributor

@siavrez siavrez commented Nov 21, 2021

Changing matplotlib.text with matplotlib.scatter and Markers created from TeX symbols improves runtime by almost 25 seconds.
Also changed time.time with time.perf_counter.

Reference Issues/PRs

#21598

What does this implement/fix? Explain your changes.

Any other comments?

…g plt.text with plt.scatter with Markers created from TeX symbols
@glemaitre glemaitre changed the title accelerate examlpe ../examples/manifold/plot_lle_digits.py by changin… MNT accelerate example plot_lle_digits.py by Nov 22, 2021
@glemaitre glemaitre changed the title MNT accelerate example plot_lle_digits.py by MNT accelerate example plot_lle_digits.py Nov 22, 2021
Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is a way to accelerate even more the example by tweaking MDS, t-SNE, and NCA to accelerate them.

def plot_embedding(X, title, ax):
X = MinMaxScaler().fit_transform(X)

for t in np.unique(y):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose the same renaming than in the previous example that I reviewed.

# of the original data. We will store the projected data as well as the computational
# time needed to perform each projection.
from time import time
from time import perf_counter
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also keep time here.

@glemaitre glemaitre self-requested a review November 22, 2021 15:42
@siavrez
Copy link
Copy Markdown
Contributor Author

siavrez commented Nov 22, 2021

Added n_iter=500, n_iter_without_progress=150, n_jobs=-1 to TSNE , n_jobs=-1 to MDS and changed init="random" to init="pca" in NCA.
Runtime is now 42 p/m 6 seconds.
from 72 p/m 12 seconds.

@adrinjalali adrinjalali mentioned this pull request Nov 22, 2021
41 tasks
Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this, @siavrez.

Here are two comments.

Comment on lines +54 to +61
for digit in digits.target_names:
ax.scatter(
*X[y == digit].T,
marker=f"${digit}$",
s=60,
color=plt.cm.Dark2(digit),
alpha=0.7,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this addition (and the deletion bellow) change the rendering of this example?

Copy link
Copy Markdown
Contributor Author

@siavrez siavrez Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default plot
previous

after this changes
newlook

changing zorder:
changedZorder

changed zorder with reduced alpha for scatterplot:
alpha0 45

n_neighbors=n_neighbors, n_components=2, method="ltsa"
),
"MDS embedding": MDS(n_components=2, n_init=1, max_iter=100),
"MDS embedding": MDS(n_components=2, n_init=1, max_iter=100, n_jobs=-1),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried changing max_iter, here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_iter bellow 100 is not converging and there's room to go to 125.
n_jobs is helping reduce runtime for MDS. for n_jobs=2 it is about 0.25 percent.
with max_iter =125 and n_jobs=2 it takes 3.9 +/- 0.2 seconds.

@glemaitre
Copy link
Copy Markdown
Member

Let's use only n_jobs=2 instead of n_jobs=-1. It would be nicer with your user.

@siavrez
Copy link
Copy Markdown
Contributor Author

siavrez commented Nov 23, 2021

Previous commit failed because of a signature test in matplotlib. So removed the param (zorder) from the initial call and set it later.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @siavrez

@siavrez siavrez requested a review from jjerphan November 23, 2021 17:47
Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @siavrez, for this PR. Having plots being reported is helpful for reviewers. +1

@jjerphan jjerphan merged commit f57a5f9 into scikit-learn:main Nov 24, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants