Skip to content

Linear models take unreasonable longer time in certain data size. #10813

@LeeLeeYeah

Description

@LeeLeeYeah

Description

When I use Lasso to fit an artificial data set, the running time has a weird pattern: When the data size is 15900 * 500 or 161000 * 500, it takes less than 2 seconds. However, when the size is 16000 * 500, it becomes more than 20 seconds. It totally makes no sense.

The experiment is repeatable. I am using sklearn 0.19.0, I tried the same program on Windows, Mac OS and Linux, all of them has this problem.

This problem appears only when the input data is large numbers. In this example I use 1e50.

Other Linear models like Ridge also have this problem.

Steps/Code to Reproduce

from sklearn import linear_model
import numpy as np
import time

estimator = linear_model.Lasso()
#dimension is fixed
dimension = 500
#sampleNumber range from 15500 to 16500
for sampleNumber in range(15400, 16500, 100):
    x = np.ones([sampleNumber, dimension]) * 1e50
    y = np.ones([sampleNumber]) * 1e50
    #measure running time
    start = time.time()
    estimator = estimator.fit(x, y)
    end = time.time()
    print("Sample Number:%d, Time: %f" % (sampleNumber, end-start))

Actual Output

Sample Number:15400, Time: 1.460229
Sample Number:15500, Time: 26.534857
Sample Number:15600, Time: 1.256980
Sample Number:15700, Time: 1.437125
Sample Number:15800, Time: 1.290524
Sample Number:15900, Time: 1.298007
Sample Number:16000, Time: 26.865131
Sample Number:16100, Time: 1.278997
Sample Number:16200, Time: 1.320840
Sample Number:16300, Time: 1.385144
Sample Number:16400, Time: 1.512907

Versions

Windows-10-10.0.16299-SP0
Python 3.6.2 |Anaconda custom (64-bit)| (default, Sep 19 2017, 08:03:39) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    EasyWell-defined and straightforward way to resolvegood first issueEasy with clear instructions to resolvehelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions