Skip to content

DOC Accelerate plot_johnson_lindenstrauss_bound.py example#21795

Merged
glemaitre merged 6 commits intoscikit-learn:mainfrom
lisacsn:accelerate_johnson_lindenstrauss_bound
May 4, 2022
Merged

DOC Accelerate plot_johnson_lindenstrauss_bound.py example#21795
glemaitre merged 6 commits intoscikit-learn:mainfrom
lisacsn:accelerate_johnson_lindenstrauss_bound

Conversation

@lisacsn
Copy link
Copy Markdown
Contributor

@lisacsn lisacsn commented Nov 26, 2021

Reference Issues/PRs

References #21598

What does this implement/fix? Explain your changes.

Speed up ../examples/miscellaneous/plot_johnson_lindenstrauss_bound.py by reducing the number of samples (n_components_range) from 10000 to 6000. The execution time is fast with n_components=300 and n_components=1000, but slow (more than 10 seconds) with n_components=10000.

Output before the changes:

And after:

Any other comments?

The other figures are exactly the same, no changes.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Nov 26, 2021

What speed-up do you observe locally? Weirdly enough, in the
image report the new runtime is 50s instead of 20s on main... I am not sure why.

Anyways the text of of the example would need to be adjusted if we change the number of components but I am not sure it's worth it.

Maybe you could try to reduce the number of document from 500 to 300 instead. The number of pairwise distances should be decreased from 250,000 to 90,000 which should yield approximately a 3x speed up on this example (assuming that the data fetching step is negligible...).

@lisacsn lisacsn force-pushed the accelerate_johnson_lindenstrauss_bound branch from 57eaefb to f8fa21a Compare November 28, 2021 10:39
@lisacsn
Copy link
Copy Markdown
Contributor Author

lisacsn commented Nov 28, 2021

On main I have:

Embedding 500 samples with dim 130107 using various random projections
Projected 500 samples from 130107 to 300 in 0.579s
Random matrix with size: 1.294MB
Mean distances rate: 0.92 (0.16)
Projected 500 samples from 130107 to 1000 in 1.980s
Random matrix with size: 4.334MB
Mean distances rate: 0.94 (0.10)
Projected 500 samples from 130107 to 10000 in 19.886s
Random matrix with size: 43.305MB
Mean distances rate: 0.97 (0.03)

And on my branch:

Embedding 500 samples with dim 130107 using various random projections
Projected 500 samples from 130107 to 300 in 0.677s
Random matrix with size: 1.295MB
Mean distances rate: 0.95 (0.17)
Projected 500 samples from 130107 to 1000 in 2.024s
Random matrix with size: 4.327MB
Mean distances rate: 1.00 (0.10)
Projected 500 samples from 130107 to 6000 in 12.037s
Random matrix with size: 25.963MB
Mean distances rate: 0.99 (0.04)

I don't know why the new runtime is 50s while on my computer the computation time of the third example (10000 to 6000 components) is reduced from 19s to 12s.

If we reduce the number of document from 500 to 300 and keep 10k components, we have:

Embedding 300 samples with dim 130107 using various random projections
Projected 300 samples from 130107 to 300 in 0.571s
Random matrix with size: 1.299MB
Mean distances rate: 0.91 (0.17)
Projected 300 samples from 130107 to 1000 in 1.920s
Random matrix with size: 4.316MB
Mean distances rate: 0.97 (0.09)
Projected 300 samples from 130107 to 10000 in 16.150s
Random matrix with size: 43.282MB
Mean distances rate: 1.02 (0.03)

And the outputs are:

@adrinjalali adrinjalali mentioned this pull request Nov 29, 2021
41 tasks
@adrinjalali adrinjalali changed the title [MRG] Accelerate example plot_johnson_lindenstrauss_bound [MRG] Accelerate example plot_johnson_lindenstrauss_bound.py Nov 29, 2021
@adrinjalali
Copy link
Copy Markdown
Member

Somehow running this is slower (33s) than what we have already in main. It's odd.

@glemaitre glemaitre self-assigned this May 3, 2022
@glemaitre glemaitre changed the title [MRG] Accelerate example plot_johnson_lindenstrauss_bound.py DOC Accelerate plot_johnson_lindenstrauss_bound.py example May 3, 2022
@glemaitre glemaitre merged commit 5d58d9d into scikit-learn:main May 4, 2022
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request May 19, 2022
…arn#21795)

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
glemaitre added a commit that referenced this pull request May 19, 2022
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants