Skip to content

Segmentation fault when calculating euclidean_distances for large numbers of rows #4197

@dschallis

Description

@dschallis

With scikit-learn 0.15.2, numpy 1.9.1, python 2.7.8 (on OS X), the following code segfaults:

import numpy
import sklearn.cluster

numpy.random.seed(1)
X = numpy.random.random((50000, 100))
model = sklearn.cluster.KMeans(n_clusters=3, random_state=1)
model.fit_predict(X)
print sklearn.metrics.silhouette_score(X, model.labels_, metric='euclidean')

Results in:

Segmentation fault: 11

Dropping the rows down to 30000, and the above completes fine. Dropping rows to 40000, and the script takes a very long amount of time, but didn't appear to segfault.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions