-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Linear models take unreasonable longer time in certain data size. #10813
Copy link
Copy link
Closed
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolvegood first issueEasy with clear instructions to resolveEasy with clear instructions to resolvehelp wanted
Description
Description
When I use Lasso to fit an artificial data set, the running time has a weird pattern: When the data size is 15900 * 500 or 161000 * 500, it takes less than 2 seconds. However, when the size is 16000 * 500, it becomes more than 20 seconds. It totally makes no sense.
The experiment is repeatable. I am using sklearn 0.19.0, I tried the same program on Windows, Mac OS and Linux, all of them has this problem.
This problem appears only when the input data is large numbers. In this example I use 1e50.
Other Linear models like Ridge also have this problem.
Steps/Code to Reproduce
from sklearn import linear_model
import numpy as np
import time
estimator = linear_model.Lasso()
#dimension is fixed
dimension = 500
#sampleNumber range from 15500 to 16500
for sampleNumber in range(15400, 16500, 100):
x = np.ones([sampleNumber, dimension]) * 1e50
y = np.ones([sampleNumber]) * 1e50
#measure running time
start = time.time()
estimator = estimator.fit(x, y)
end = time.time()
print("Sample Number:%d, Time: %f" % (sampleNumber, end-start))Actual Output
Sample Number:15400, Time: 1.460229
Sample Number:15500, Time: 26.534857
Sample Number:15600, Time: 1.256980
Sample Number:15700, Time: 1.437125
Sample Number:15800, Time: 1.290524
Sample Number:15900, Time: 1.298007
Sample Number:16000, Time: 26.865131
Sample Number:16100, Time: 1.278997
Sample Number:16200, Time: 1.320840
Sample Number:16300, Time: 1.385144
Sample Number:16400, Time: 1.512907Versions
Windows-10-10.0.16299-SP0
Python 3.6.2 |Anaconda custom (64-bit)| (default, Sep 19 2017, 08:03:39) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolvegood first issueEasy with clear instructions to resolveEasy with clear instructions to resolvehelp wanted