Skip to content

n_jobs for DBSCAN #16299

@sp7412

Description

@sp7412

#16213 ## Describe the bug

n_jobs argument doesn't seem to change the time it takes to run DBSCAN.fit(). Runtime is the same with and without n_jobs.
Is n_jobs actually implemented for DBSCAN.fit()?

Steps/Code to Reproduce

Example:

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
from time import time
from sklearn.cluster import DBSCAN

# generate a symmetric distance matrix
num_training_examples = 30000
num_features = 10
X = np.random.randint(5, size=(num_training_examples, num_features))
D = euclidean_distances(X,X)

# DBSCAN parameters
eps = 0.25
kmedian_thresh = 0.005
min_samples = 5

# case 1: omit n_jobs arg from DBSCAN
start = time()
db = DBSCAN(eps=eps,
            min_samples = min_samples,
            metric='precomputed').fit(D)
end = time()
total_time = end - start
print('DBSCAN took {} seconds for {} training examples without n_jobs arg'\
       .format(total_time,num_training_examples))


# case 2: add n_jobs arg to DBSCAN
n_jobs = -1
start = time()
db = DBSCAN(eps=eps,
            min_samples = min_samples,
            metric='precomputed',
            n_jobs=n_jobs).fit(D)
end = time()
total_time = end - start
print('DBSCAN took {} seconds for {} training examples with n_jobs arg'\
       .format(total_time,num_training_examples,n_jobs))

Sample code to reproduce the problem


#### Expected Results

Expected runtime to decrease with more processors.

#### Actual Results

Runtime basically unchanged.

DBSCAN took 285.76699996 seconds for 30000 training examples without n_jobs arg
DBSCAN took 363.289000034 seconds for 30000 training examples with n_jobs arg

#### Versions
    Cython: None
     scipy: 1.2.2
setuptools: 41.6.0
       pip: 19.3.1
     numpy: 1.16.5
    pandas: 0.24.2
   sklearn: 0.20.4
import sys; print("Python", sys.version)
('Python', '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]')

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions