Skip to content

dbscan uses large amount of ram #26726

@hirschmichael

Description

@hirschmichael

Describe the bug

I'm using sklearn version 1.1.2 . In the following code dbscan uses about 15GB of memory. The size of xy is 2.88MB. This can't be right.

from sklearn.cluster import dbscan
import numpy as np
nclust = 12
cluster_size = 15000
xy = []
for i in range(nclust):
    centre = np.random.uniform(0, 20000, (1,2))
    cluster = np.random.randn(cluster_size, 2) * 15 + centre
    xy.append(cluster)
xy = np.vstack(xy)
dbscan(xy, eps=40, min_samples=10, algorithm='kd_tree', leaf_size=500)

Steps/Code to Reproduce

from sklearn.cluster import dbscan
import numpy as np
nclust = 12
cluster_size = 15000
xy = []
for i in range(nclust):
    centre = np.random.uniform(0, 20000, (1,2))
    cluster = np.random.randn(cluster_size, 2) * 15 + centre
    xy.append(cluster)
xy = np.vstack(xy)
dbscan(xy, eps=40, min_samples=10, algorithm='kd_tree', leaf_size=500)

Expected Results

No sure

Actual Results

15GB of RAM usage by dbscan execution

Versions

System:
    python: 3.8.10 (default, May 26 2023, 14:05:08)  [GCC 9.4.0]
executable: /home/vcn81216/fileserver_home/python/python3/bin/python
   machine: Linux-5.15.0-75-generic-x86_64-with-glibc2.29

Python dependencies:
      sklearn: 1.1.2
          pip: 23.1.2
   setuptools: 59.1.0
        numpy: 1.23.2
        scipy: 1.9.1
       Cython: 0.29.23
       pandas: 1.4.4
   matplotlib: 3.5.3
       joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /mnt/rclsfserv005/users/vcn81216/python/python3/lib/python3.8/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
        version: 0.3.20
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 12

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /mnt/rclsfserv005/users/vcn81216/python/python3/lib/python3.8/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /mnt/rclsfserv005/users/vcn81216/python/python3/lib/python3.8/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 12

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions