-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
n_jobs for DBSCAN #16299
Copy link
Copy link
Closed
Labels
DocumentationEasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveSprintgood first issueEasy with clear instructions to resolveEasy with clear instructions to resolvehelp wantedmodule:cluster
Description
#16213 ## Describe the bug
n_jobs argument doesn't seem to change the time it takes to run DBSCAN.fit(). Runtime is the same with and without n_jobs.
Is n_jobs actually implemented for DBSCAN.fit()?
Steps/Code to Reproduce
Example:
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
from time import time
from sklearn.cluster import DBSCAN
# generate a symmetric distance matrix
num_training_examples = 30000
num_features = 10
X = np.random.randint(5, size=(num_training_examples, num_features))
D = euclidean_distances(X,X)
# DBSCAN parameters
eps = 0.25
kmedian_thresh = 0.005
min_samples = 5
# case 1: omit n_jobs arg from DBSCAN
start = time()
db = DBSCAN(eps=eps,
min_samples = min_samples,
metric='precomputed').fit(D)
end = time()
total_time = end - start
print('DBSCAN took {} seconds for {} training examples without n_jobs arg'\
.format(total_time,num_training_examples))
# case 2: add n_jobs arg to DBSCAN
n_jobs = -1
start = time()
db = DBSCAN(eps=eps,
min_samples = min_samples,
metric='precomputed',
n_jobs=n_jobs).fit(D)
end = time()
total_time = end - start
print('DBSCAN took {} seconds for {} training examples with n_jobs arg'\
.format(total_time,num_training_examples,n_jobs))Sample code to reproduce the problem
#### Expected Results
Expected runtime to decrease with more processors.
#### Actual Results
Runtime basically unchanged.
DBSCAN took 285.76699996 seconds for 30000 training examples without n_jobs arg
DBSCAN took 363.289000034 seconds for 30000 training examples with n_jobs arg
#### Versions
Cython: None
scipy: 1.2.2
setuptools: 41.6.0
pip: 19.3.1
numpy: 1.16.5
pandas: 0.24.2
sklearn: 0.20.4
import sys; print("Python", sys.version)
('Python', '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]')
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
DocumentationEasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveSprintgood first issueEasy with clear instructions to resolveEasy with clear instructions to resolvehelp wantedmodule:cluster