Skip to content

memory leak in k_means for 0.23.dev0 #16991

@lunebellec

Description

@lunebellec

Running k_means permanently uses memory.

Steps/Code to Reproduce

Save the following snipet in a script issue_memory.py and run it with python3 -i issue_memory.py.

from sklearn.cluster import k_means
import numpy as np

n_samples = 1000
n_features = 100000
data = np.random.normal(size=[n_samples , n_features]) # Some Gaussian random noise

_, part, _ = k_means(
    data,
    n_clusters=10,
    init="k-means++",
    max_iter=30,
    n_init=10,
    random_state=1,
)
del part
del data

Expected Results

the script will transiently use memory (a few GB), and after completion (before exiting the session) memory usage will be back precisely where it was prior to running the script. Exiting the session will not change memory level. This is what happens with sklearn 0.22.

Actual Results

Using the current tip of master (version 0.23.dev0), even after cleaning all variables, there is still about 500 MB of memory used in the session. I am working on ensemble clustering and, after many iterations, this memory leak eventually ends up filling all available RAM. If you exit the session, this memory is released.

Versions

System:
    python: 3.7.5rc1 (default, Oct  8 2019, 16:47:45)  [GCC 9.2.1 20191008]
executable: /home/pbellec/env/dypac/bin/python3
   machine: Linux-5.3.0-46-generic-x86_64-with-Ubuntu-19.10-eoan

Python dependencies:
       pip: 20.0.2
setuptools: 45.0.0
   sklearn: 0.23.dev0
     numpy: 1.18.1
     scipy: 1.4.1
    Cython: None
    pandas: 0.25.3
matplotlib: 3.1.2
    joblib: 0.14.1

Built with OpenMP: True
Linux-5.3.0-46-generic-x86_64-with-Ubuntu-19.10-eoan
Python 3.7.5rc1 (default, Oct  8 2019, 16:47:45) 
[GCC 9.2.1 20191008]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.23.dev0
Traceback (most recent call last):
  File "version_sklearn.py", line 7, in <module>
    import imblearn; print("Imbalanced-Learn", imblearn.__version__)
ModuleNotFoundError: No module named 'imblearn'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions