-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
memory leak in k_means for 0.23.dev0 #16991
Copy link
Copy link
Closed
Description
Running k_means permanently uses memory.
Steps/Code to Reproduce
Save the following snipet in a script issue_memory.py and run it with python3 -i issue_memory.py.
from sklearn.cluster import k_means
import numpy as np
n_samples = 1000
n_features = 100000
data = np.random.normal(size=[n_samples , n_features]) # Some Gaussian random noise
_, part, _ = k_means(
data,
n_clusters=10,
init="k-means++",
max_iter=30,
n_init=10,
random_state=1,
)
del part
del dataExpected Results
the script will transiently use memory (a few GB), and after completion (before exiting the session) memory usage will be back precisely where it was prior to running the script. Exiting the session will not change memory level. This is what happens with sklearn 0.22.
Actual Results
Using the current tip of master (version 0.23.dev0), even after cleaning all variables, there is still about 500 MB of memory used in the session. I am working on ensemble clustering and, after many iterations, this memory leak eventually ends up filling all available RAM. If you exit the session, this memory is released.
Versions
System:
python: 3.7.5rc1 (default, Oct 8 2019, 16:47:45) [GCC 9.2.1 20191008]
executable: /home/pbellec/env/dypac/bin/python3
machine: Linux-5.3.0-46-generic-x86_64-with-Ubuntu-19.10-eoan
Python dependencies:
pip: 20.0.2
setuptools: 45.0.0
sklearn: 0.23.dev0
numpy: 1.18.1
scipy: 1.4.1
Cython: None
pandas: 0.25.3
matplotlib: 3.1.2
joblib: 0.14.1
Built with OpenMP: True
Linux-5.3.0-46-generic-x86_64-with-Ubuntu-19.10-eoan
Python 3.7.5rc1 (default, Oct 8 2019, 16:47:45)
[GCC 9.2.1 20191008]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.23.dev0
Traceback (most recent call last):
File "version_sklearn.py", line 7, in <module>
import imblearn; print("Imbalanced-Learn", imblearn.__version__)
ModuleNotFoundError: No module named 'imblearn'
Reactions are currently unavailable