Skip to content

ENH Release GIL in DistanceMetric when validating data#17038

Merged
thomasjpfan merged 12 commits intoscikit-learn:masterfrom
webber26232:haversineGIL
May 23, 2020
Merged

ENH Release GIL in DistanceMetric when validating data#17038
thomasjpfan merged 12 commits intoscikit-learn:masterfrom
webber26232:haversineGIL

Conversation

@webber26232
Copy link
Copy Markdown
Contributor

@webber26232 webber26232 commented Apr 25, 2020

Reference Issues/PRs

Fixes #16970

What does this implement/fix? Explain your changes.

  • Remove GIL acquiring in HaversineDistance: [sklearn.neighbors._dist_metrics.HaversineDistance] to enable parallel computing.
  • Move input dimension validation into BinaryTree.

Any other comments?

Not sure if hard coding input dimension in BinaryTree is the best solution. Maybe all DistanceMetric need a validate_input() function.

@rth
Copy link
Copy Markdown
Member

rth commented Apr 25, 2020

Thanks @webber26232 ! Do you actually see a difference in performance with multithreading after this change? The with gil should never be entered as long as size == 2: and otherwise it would just error.

@webber26232
Copy link
Copy Markdown
Contributor Author

Thanks @webber26232 ! Do you actually see a difference in performance with multithreading after this change? The with gil should never be entered as long as size == 2: and otherwise it would just error.

Yes I saw a reasonable parallel performance improvement after moving with gil: and raise ValueError statements out of the cdef functions. Not sure why Cython doesn't handle GIL releasing appropriately here.

And it seems like the validation has to be performed here. May need another solution.

@rth
Copy link
Copy Markdown
Member

rth commented Apr 25, 2020

Hmm, could you please provide an example script where you see the impact? If Cython doesn't indeed handle this well, it would have impact elsewhere as well as we frequently use conditional with gil in Cython code in general.

@webber26232
Copy link
Copy Markdown
Contributor Author

webber26232 commented Apr 26, 2020

Hmm, could you please provide an example script where you see the impact? If Cython doesn't indeed handle this well, it would have impact elsewhere as well as we frequently use conditional with gil in Cython code in general.

You can try the scripts below.

I got the output on my machine:

Conditionally acquire GIL (n_jobs=1) takes 1.292s
Conditionally acquire GIL (n_jobs=2) takes 6.811s
Conditionally acquire GIL (n_jobs=3) takes 17.586s
Conditionally acquire GIL (n_jobs=4) takes 22.653s
No GIL (n_jobs=1) takes 0.433s
No GIL (n_jobs=2) takes 0.241s
No GIL (n_jobs=3) takes 0.201s
No GIL (n_jobs=4) takes 0.141s
# _gil_test.pyx

import numpy as np
cimport numpy as np
from libc.math cimport sqrt, cos, sin, asin

cdef np.float64_t haversine_gil(np.float64_t[:] x1, np.float64_t[:] x2,
                                np.int64_t size) nogil except -1:
    if size != 2:
        with gil:
            raise ValueError()
    cdef np.float64_t sin_0 = sin(0.5 * (x1[0] - x2[0]))
    cdef np.float64_t sin_1 = sin(0.5 * (x1[1] - x2[1]))
    return (sin_0 * sin_0 + cos(x1[0]) * cos(x2[0]) * sin_1 * sin_1)


cdef np.float64_t haversine_nogil(np.float64_t[:] x1, np.float64_t[:] x2,
                                  np.int64_t size) nogil except -1:
    if size != 2:
        pass
    cdef np.float64_t sin_0 = sin(0.5 * (x1[0] - x2[0]))
    cdef np.float64_t sin_1 = sin(0.5 * (x1[1] - x2[1]))
    return (sin_0 * sin_0 + cos(x1[0]) * cos(x2[0]) * sin_1 * sin_1)


cpdef np.float64_t test_gil(np.float64_t[:,:] x1,
                              np.float64_t[:,:] x2):
    cdef np.float64_t out_sum = 0
    cdef int i = 0
    cdef int size = x1.shape[0]
    with nogil:
        while i < size:
            out_sum += haversine_gil(x1[i], x2[i], 2)
            i += 1
    return out_sum

cpdef np.float64_t test_nogil(np.float64_t[:,:] x1,
                              np.float64_t[:,:] x2):
    cdef np.float64_t out_sum = 0
    cdef int i = 0
    cdef int size = x1.shape[0]
    with nogil:
        while i < size:
            out_sum += haversine_nogil(x1[i], x2[i], 2)
            i += 1
    return out_sum
# setup.py

from setuptools import setup
from Cython.Build import cythonize
import numpy as np
setup(
    ext_modules = cythonize("_gil_test.pyx"),
    include_dirs=[np.get_include()]
)
# gil_test.py

import time
import numpy as np
from sklearn.utils import gen_even_slices
from joblib import Parallel, delayed
from _gil_test import test_gil, test_nogil

n_samples = 10000000
n_features = 2
np.random.seed(0)
x1 = np.random.randn(n_samples, n_features).astype(np.float64)
x2 = np.random.randn(n_samples, n_features).astype(np.float64)

n_iter = 5

for n_jobs in range(1, 5):
    start = time.time()
    for i in range(n_iter):
        Parallel(n_jobs=n_jobs, backend='threading')(
           delayed(test_gil)(x1[s], x2[s])
               for s in gen_even_slices(x1.shape[0], n_jobs))
    print("Conditionally acquire GIL (n_jobs={}) takes "
          "{:.3f}s".format(n_jobs, (time.time() - start) / n_iter))

for n_jobs in range(1, 5):
    start = time.time()
    for i in range(3):
        Parallel(n_jobs=n_jobs, backend='threading')(
            delayed(test_nogil)(x1[s], x2[s])
                for s in gen_even_slices(x1.shape[0], n_jobs))
    print("No GIL (n_jobs={}) takes "
          "{:.3f}s".format(n_jobs, (time.time() - start) / n_iter))

@rth
Copy link
Copy Markdown
Member

rth commented Apr 26, 2020

Thanks you are right, it's pretty bad. Using your scripts with

n_samples = 2000000
n_features = 200

I get

Conditionally acquire GIL (n_jobs=1) takes 0.171s
Conditionally acquire GIL (n_jobs=2) takes 0.775s
Conditionally acquire GIL (n_jobs=3) takes 2.186s
Conditionally acquire GIL (n_jobs=4) takes 3.021s
No GIL (n_jobs=1) takes 0.063s
No GIL (n_jobs=2) takes 0.035s
No GIL (n_jobs=3) takes 0.026s
No GIL (n_jobs=4) takes 0.021s

Opened cython/cython#3554 . It would be nice to also fix other affected metrics: SEuclideanDistance, WMinkowskiDistance, MahalanobisDistance.

@rth
Copy link
Copy Markdown
Member

rth commented Apr 26, 2020

For CI I think you need to also validate input in BinaryTree.query and query_radius, as well as DistanceMetric.pairwise. Maybe we could add a private method DistanceMetric._validate_data then specialize it in child classes as needed, and use it where input validation is necessary. This would be also more consistent with SLEP 010

@webber26232
Copy link
Copy Markdown
Contributor Author

For CI I think you need to also validate input in BinaryTree.query and query_radius, as well as DistanceMetric.pairwise. Maybe we could add a private method DistanceMetric._validate_data then specialize it in child classes as needed, and use it where input validation is necessary. This would be also more consistent with SLEP 010

Yes, I think that's a good idea.

@rth
Copy link
Copy Markdown
Member

rth commented May 2, 2020

Given that this PR is unlikely to make it in v0.23 (Release Candidate released in the following days) and that this is fixed in cython 3.0a3, maybe we can just wait for the stable cython release and re-build scikit-learn wheels with it..

@rth rth mentioned this pull request May 2, 2020
@rth
Copy link
Copy Markdown
Member

rth commented May 2, 2020

Although doing input validation in a DistanceMetric._validate_data instead of the rdist or dist that are called in a loop over samples would be an improvement in any case. So this PR would still be welcome.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented May 2, 2020 via email

@rth
Copy link
Copy Markdown
Member

rth commented May 2, 2020

Is n_features=200 realistic for haversine

No, my mistake. But anyway the initial benchmarks in #17038 (comment) are still relevant (and correct). I was just double checking.

@webber26232
Copy link
Copy Markdown
Contributor Author

Although doing input validation in a DistanceMetric._validate_data instead of the rdist or dist that are called in a loop over samples would be an improvement in any case. So this PR would still be welcome.

Sounds good, I'll start to work on it.

@webber26232 webber26232 changed the title Release GIL in HaversineDistance for BallTree parallalization Release GIL in DistanceMetric for parallelization when validating data is required May 4, 2020
Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @webber26232 and @scoder! LGTM.

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @webber26232

This is a really nice improvement!

@thomasjpfan
Copy link
Copy Markdown
Member

Please add an entry to the change log at doc/whats_new/v0.24.rst with tag |Efficiency|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

webber26232 and others added 4 commits May 10, 2020 22:38
Merge branch 'master' of github.com:webber26232/scikit-learn into haversineGIL
@webber26232
Copy link
Copy Markdown
Contributor Author

@rth @thomasjpfan Thanks for your reviewing. I updated the What's New. It should be good to merge.

@rth
Copy link
Copy Markdown
Member

rth commented May 11, 2020

Thanks! So WMinkowskiDistance etc and _validate_data are actually private and we can't refer to them in the changelog. Could you please reformulate it rather in terms of effect this would have on public classes e.g. NearestNeigbours, KNeighborsRegressor, KNeighborsClassifier when used with metric=".." and n_jobs>1 (or about BallTree/KDTree releasing the GIL with those metrics).

@webber26232
Copy link
Copy Markdown
Contributor Author

webber26232 commented May 11, 2020

Thanks! So WMinkowskiDistance etc and _validate_data are actually private and we can't refer to them in the changelog. Could you please reformulate it rather in terms of effect this would have on public classes e.g. NearestNeigbours, KNeighborsRegressor, KNeighborsClassifier when used with metric=".." and n_jobs>1 (or about BallTree/KDTree releasing the GIL with those metrics).

I also updated it to referring neighbors.DistanceMetric.

@thomasjpfan thomasjpfan changed the title Release GIL in DistanceMetric for parallelization when validating data is required ENH Release GIL in DistanceMetric when validating data May 23, 2020
@thomasjpfan thomasjpfan merged commit fc474d6 into scikit-learn:master May 23, 2020
@thomasjpfan
Copy link
Copy Markdown
Member

Thank you @webber26232!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BallTree haversine metric is not paralleled

5 participants