ENH add pairwise distances backend for LocalOutlierFactor by OmarManzoor · Pull Request #26316 · scikit-learn/scikit-learn

OmarManzoor · 2023-05-02T09:44:48Z

Reference Issues/PRs

Towards #25888

What does this implement/fix? Explain your changes.

Adds a dedicated pairwise distances backend to handle computation of local reachability density in the fit method of LocalOutlierFactor.

Any other comments?

CC: @jjerphan Could you kindly have a look and see how this looks?

Micky774

Hey there @OmarManzoor, always wonderful seeing a PR from you :)

I know you are still early in the process of finalizing this PR and it is only marked as a draft, but I figured some (potentially unsolicited) advice may help smooth out the road in front of you.

I also wanted to go ahead and mention that this PR will require some thorough benchmarking just to confirm that there is indeed a performance gain, and to potentially evaluate what the best deployment strategies are (i.e. parallel_on_{X, Y}) since the asymmetry of the algorithm may require us to adopt a different heuristic than what is the default for ArgKmin.

Please do not hesitate to ping if you have any questions or concerns. Thanks!

sklearn/neighbors/_lof.py

sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

sklearn/metrics/_pairwise_distances_reduction/_argkmin_lrd.pyx.tp

…it method

Micky774

Looking good so far. Aside from the missing test, please ping me once you have some benchmark results ready :)

Micky774 · 2023-05-12T13:16:17Z

sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

+        raise ValueError(
+            "Only float64 or float32 datasets pairs are supported at this time, "
+            f"got: X.dtype={X.dtype} and Y.dtype={Y.dtype}."
+        )


Need to add a test for this

jjerphan · 2023-05-15T06:55:37Z

@OmarManzoor: you can use asv to easily perform benchmarks between those revisions.

You can take some inspiration from those ASV benchmarks definitions and adapt them for LocalOutlierFactor.

OmarManzoor · 2023-05-15T07:07:01Z

@OmarManzoor: you can use asv to easily perform benchmarks between those revisions.

You can take some inspiration from those ASV benchmarks definitions and adapt them for LocalOutlierFactor.

Right thank you.

OmarManzoor · 2023-05-16T07:35:17Z

@jjerphan Could you kindly have a look if this benchmark seems correct argkmin_lrd_benchmark?

OmarManzoor · 2023-05-16T07:38:39Z

@jjerphan Could you kindly have a look if this benchmark seems correct argkmin_lrd_benchmark?

I think this is going to take a lot of time to run.

OmarManzoor · 2023-05-16T09:04:59Z

@jjerphan @Micky774 These are the results that I got so far

For scikit-learn commit 881123aa <pairwise_distance_backend_for_lof> (round 1/1):
argkmin_lrd.ArgKminLRDBenchmark.time_fit
========== ============ ============= ==============
--                    n_test / n_features           
---------- -----------------------------------------
n_train    1000 / 100   10000 / 100   100000 / 100 
========== ============ ============= ==============
 1000      19.3±2ms     18.2±0.1ms    18.0±0.1ms  
10000     1.06±0.02s    1.02±0.02s    1.04±0.02s  
10000000     failed        failed        failed    
========== ============ ============= ==============


For scikit-learn commit c5f10c8b <main> (round 1/1):
[argkmin_lrd.ArgKminLRDBenchmark.time_fit
========== ============ ============= ==============
--                    n_test / n_features           
---------- -----------------------------------------
n_train    1000 / 100   10000 / 100   100000 / 100 
========== ============ ============= ==============
  1000     4.48±0.2ms    4.81±0.1ms    4.65±0.1ms  
 10000      142±6ms       146±2ms       147±5ms    
10000000     failed        failed        failed    
========== ============ ============= ==============

I don't think this looks good. The performance seems to have decreased.

Micky774 · 2023-05-16T13:33:39Z

I don't think this looks good. The performance seems to have decreased.

Can you try profiling (e.g. with Linux perf, cprofile or whatever else you prefer) to see what the hotspots are? I'll also take a look soon to see if I can duplicate your results. We'll get this sorted :)

OmarManzoor · 2023-05-16T14:33:06Z

I don't think this looks good. The performance seems to have decreased.

Can you try profiling (e.g. with Linux perf, cprofile or whatever else you prefer) to see what the hotspots are? I'll also take a look soon to see if I can duplicate your results. We'll get this sorted :)

Thank you!

OmarManzoor · 2023-05-17T07:21:50Z

@Micky774 I tried using cProfile and these are the results for the argkmin_lrd.pyx file
using this simple script

Code

import cProfile
import numpy as np

from sklearn.neighbors import LocalOutlierFactor

n_train = 1000
n_features = 100

rng = np.random.RandomState(0)
X_train = rng.rand(n_train, n_features).astype(np.float32)
cProfile.run("LocalOutlierFactor(n_neighbors=10).fit(X=X_train)")

ncalls  tottime  percall  cumtime  percall   filename:lineno(function)
1        0.000    0.000    0.000    0.000     _argkmin_lrd.pyx:134(_finalize_results)
1        0.031    0.031    0.074    0.074     _argkmin_lrd.pyx:34(compute)
1        0.000    0.000    0.015    0.015     _argkmin_lrd.pyx:64(__init__)

It seems like the compute method if taking the actual time.

jjerphan · 2023-05-17T08:56:34Z

Python and native code can be profiled with py-spy and results exported for SpeedScope using:

py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py

I would recommend setting thread affinity to only use one. On GNU/Linux, this can be done using taskset(1):

taskset -c 0 py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py

Profiles' files can be uploaded here so that they be inspected by readers as well.

OmarManzoor · 2023-05-17T09:32:25Z

Python and native code can be profiled with py-spy and results exported for SpeedScope using:
py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py
I would recommend setting thread affinity to only use one. On GNU/Linux, this can be done using taskset(1):
taskset -c 0 py-spy record --native -o py-spy.profile -f speedscope -- python ./script.py
Profiles' files can be uploaded here so that they be inspected by readers as well.

I think I realized what the issue is with the significant drop. The metric was being set as euclidean which has a specialized implementation when we use kneighbors.

jjerphan · 2023-05-17T10:46:37Z

Cool!

OmarManzoor · 2023-05-17T11:01:22Z

These are the latest benchmarks with float32

Benchmark Code

import numpy as np

from .common import Benchmark

from sklearn.neighbors import LocalOutlierFactor


class ArgKminLRDBenchmark(Benchmark):
    param_names = ["n_train", "n_features"]
    params = [
        [1000, 10000, 100000],
        [100],
    ]

    def setup(self, n_train, n_features):
        rng = np.random.RandomState(0)
        self.X_train = rng.rand(n_train, n_features).astype(np.float32)
        self.y_train = rng.randint(low=-1, high=1, size=(n_train,))

    def time_fit(self, n_train, n_features):
        est = LocalOutlierFactor(n_neighbors=10, metric="manhattan").fit(X=self.X_train)
        self.estimator_ = est

PR - time_fit 

========= ============
 --         n_features 
--------- ------------
n_train      100     
========= ============
  1000    16.9±0.9ms 
 10000     926±20ms  
 100000   1.54±0.2m  
========= ============


MAIN - time_fit

========= ============
--         n_features 
--------- ------------
 n_train      100     
========= ============
   1000    16.5±0.7ms 
  10000     926±9ms   
  100000   1.49±0.01m 
========= ============

OmarManzoor · 2023-05-17T11:42:32Z

Benchmarks with float64

Benchmark Code

import numpy as np

from .common import Benchmark

from sklearn.neighbors import LocalOutlierFactor


class ArgKminLRDBenchmark(Benchmark):
    param_names = ["n_train", "n_features"]
    params = [
        [1000, 10000, 100000],
        [100],
    ]

    def setup(self, n_train, n_features):
        rng = np.random.RandomState(0)
        self.X_train = rng.rand(n_train, n_features).astype(np.float64)
        self.y_train = rng.randint(low=-1, high=1, size=(n_train,))

    def time_fit(self, n_train, n_features):
        LocalOutlierFactor(n_neighbors=10, metric="manhattan").fit(X=self.X_train)

PR - time_fit 

========= ============
 --         n_features 
--------- ------------
n_train      100     
========= ============
  1000     18.6±0.1ms 
 10000     1.11±0.02s  
 100000    1.77±0m  
========= ============


MAIN - time_fit

========= ============
--         n_features 
--------- ------------
 n_train      100     
========= ============
   1000    18.7±0.4ms 
  10000    1.11±0.02s   
  100000   1.80±0.01m 
========= ============

OmarManzoor · 2023-05-17T11:44:07Z

It looks like no specific improvements are noted.

OmarManzoor · 2023-05-17T13:41:27Z

Even by setting the number of neighbors to be 1000 no improvement was noted.

Benchmark Code

import numpy as np

from .common import Benchmark

from sklearn.neighbors import LocalOutlierFactor


class ArgKminLRDBenchmark(Benchmark):
    param_names = ["n_train", "n_features"]
    params = [
        [100000],
        [100],
    ]

    def setup(self, n_train, n_features):
        rng = np.random.RandomState(0)
        self.X_train = rng.rand(n_train, n_features).astype(np.float32)

    def time_fit(self, n_train, n_features):
        LocalOutlierFactor(n_neighbors=1000, metric="manhattan").fit(X=self.X_train)

[ 0.00%] · For scikit-learn commit d15e8c09 <pairwise_distance_backend_for_lof> (round 1/1):
[ 0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[50.00%] ··· argkmin_lrd.ArgKminLRDBenchmark.time_fit                                                                                                                    ok
[50.00%] ··· ========= ============
             --         n_features 
             --------- ------------
              n_train      100     
             ========= ============
               100000   1.68±0.01m 
             ========= ============

[50.00%] · For scikit-learn commit 55af30d9 <main> (round 1/1):
[50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl..
[50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[100.00%] ··· argkmin_lrd.ArgKminLRDBenchmark.time_fit                                                                                                                    ok
[100.00%] ··· ========= =========
              --        n_features
              --------- ---------
               n_train     100   
              ========= =========
                100000   1.68±0m 
              ========= =========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Micky774 · 2023-05-19T19:41:33Z

After inspecting w/ py-spy it seems that, quite simply, the computational bottleneck of LocalOutlierFactor is the computation of kneighbors, which is already ArgKmin accelerated. The actual _local_reachability_density work comprises ~1.2% of runtime. There's not even much to optimize out, unfortunately.

@jjerphan @OmarManzoor please do sanity check me on this, but afaik I think this is just ill-suited for further optimization via a dedicated backend. This is good information though, since it allows us to focus efforts in other areas -- and also probably should set the precedent of confirming bottlenecks w/ an initial profiling rather than after work has been done :)

Once you two are satisfied with this explanation, we can close this and move on.

OmarManzoor · 2023-05-20T07:21:13Z

After inspecting w/ py-spy it seems that, quite simply, the computational bottleneck of LocalOutlierFactor is the computation of kneighbors, which is already ArgKmin accelerated. The actual _local_reachability_density work comprises ~1.2% of runtime. There's not even much to optimize out, unfortunately.

@jjerphan @OmarManzoor please do sanity check me on this, but afaik I think this is just ill-suited for further optimization via a dedicated backend. This is good information though, since it allows us to focus efforts in other areas -- and also probably should set the precedent of confirming bottlenecks w/ an initial profiling rather than after work has been done :)

Once you two are satisfied with this explanation, we can close this and move on.

Thank you for inspecting with py-spy. I think what you are specifying makes sense and reinforces the observed benchmarks.

OmarManzoor · 2023-05-20T09:01:49Z

@jjerphan I think we can close this PR then?

jjerphan · 2023-05-20T10:28:03Z

Closing since no performance improvements can be reached.

Sorry, @OmarManzoor. 🤦‍♂️ I wished I provided more evidenced certainty about room for performance improvements beforehand, with a profile of the current execution of algorithms for instance.

RadiusNeighbors.{predict, predict_proba} might be good candidates, yet we need to profile their execution to be sure dedicated backends will bring benefits.

I think I will reword the issue description in this regard.

OmarManzoor · 2023-05-20T12:45:24Z

Closing since no performance improvements can be reached.

Sorry, @OmarManzoor. 🤦‍♂️ I wished I provided more evidenced certainty about room for performance improvements beforehand, with a profile of the current execution of algorithms for instance.

RadiusNeighbors.{predict, predict_proba} might be good candidates, yet we need to profile their execution to be sure dedicated backends will bring benefits.

I think I will reword the issue description in this regard.

No no there is no need to apologize. It was a good learning experience. We can try out RadiusNeighbors.{predict, predict_proba} next as you suggest. Thank you.

Micky774 · 2023-05-20T21:49:02Z

No no there is no need to apologize. It was a good learning experience.

❤️

We can try out RadiusNeighbors.{predict, predict_proba} next as you suggest. Thank you.

I have updated #25888 to reflect the other ongoing/open work for finalizing the optimization of KNeighbors*.predict* in case you would be interested in that as well. Whatever you prefer though!

OmarManzoor added 3 commits April 3, 2023 15:44

ENH add pairwise distances backend for LocalOutlierFactor

fed7db5

Update to _compute_local_reachability_density through _finalize_results

efeab7a

Merge branch 'main' into pairwise_distance_backend_for_lof

12d1f2f

github-actions bot added cython module:metrics module:neighbors labels May 2, 2023

Micky774 reviewed May 10, 2023

View reviewed changes

Micky774 mentioned this pull request May 10, 2023

PERF PairwiseDistancesReductions subsequent work #25888

Open

21 tasks

PR suggestions

3926220

Micky774 reviewed May 11, 2023

View reviewed changes

sklearn/metrics/_pairwise_distances_reduction/_argkmin_lrd.pyx.tp Show resolved Hide resolved

sklearn/metrics/_pairwise_distances_reduction/_argkmin_lrd.pyx.tp Outdated Show resolved Hide resolved

sklearn/metrics/_pairwise_distances_reduction/_argkmin_lrd.pyx.tp Show resolved Hide resolved

Remove the train and factor parameter and focus this PR on just the f…

1ea53da

…it method

Micky774 reviewed May 12, 2023

View reviewed changes

jjerphan mentioned this pull request May 14, 2023

ENH add pairwise distances backend for LocalOutlierFactor OmarManzoor/scikit-learn#2

Closed

Add missing test

881123a

Merge branch 'main' into pairwise_distance_backend_for_lof

d15e8c0

jjerphan closed this May 20, 2023

OmarManzoor deleted the pairwise_distance_backend_for_lof branch May 20, 2023 12:45

Micky774 mentioned this pull request Jun 23, 2023

FEA Introduce PairwiseDistances, a generic back-end for pairwise_distances #25561

Closed

jjerphan mentioned this pull request Jul 11, 2023

PERF Implement PairwiseDistancesReduction backend for RadiusNeighbors.predict_proba OmarManzoor/scikit-learn#4

Closed

Uh oh!

Conversation

OmarManzoor commented May 2, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Micky774 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Micky774 left a comment

Choose a reason for hiding this comment

Uh oh!

Micky774 May 12, 2023

Choose a reason for hiding this comment

Uh oh!

jjerphan commented May 15, 2023

Uh oh!

OmarManzoor commented May 15, 2023

Uh oh!

OmarManzoor commented May 16, 2023

Uh oh!

OmarManzoor commented May 16, 2023

Uh oh!

OmarManzoor commented May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Micky774 commented May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OmarManzoor commented May 16, 2023

Uh oh!

OmarManzoor commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjerphan commented May 17, 2023

Uh oh!

OmarManzoor commented May 17, 2023

Uh oh!

jjerphan commented May 17, 2023

Uh oh!

OmarManzoor commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OmarManzoor commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OmarManzoor commented May 17, 2023

Uh oh!

OmarManzoor commented May 17, 2023

Uh oh!

Micky774 commented May 19, 2023

Uh oh!

OmarManzoor commented May 20, 2023

Uh oh!

OmarManzoor commented May 20, 2023

Uh oh!

jjerphan commented May 20, 2023

Uh oh!

OmarManzoor commented May 20, 2023

Uh oh!

Micky774 commented May 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

OmarManzoor commented May 16, 2023 •

edited

Loading

Micky774 commented May 16, 2023 •

edited

Loading

OmarManzoor commented May 17, 2023 •

edited

Loading

OmarManzoor commented May 17, 2023 •

edited

Loading

OmarManzoor commented May 17, 2023 •

edited

Loading

Micky774 commented May 20, 2023 •

edited

Loading