BUG Fix instability issue of ARDRegression (with speedup)#16849
BUG Fix instability issue of ARDRegression (with speedup)#16849thomasjpfan merged 5 commits intoscikit-learn:masterfrom
Conversation
| # With the inputs above, ARDRegression prunes both of the two coefficients | ||
| # in the first iteration. Hence, the expected shape of `sigma_` is (0, 0). | ||
| assert clf.sigma_.shape == (0, 0) |
There was a problem hiding this comment.
If the target is constant, it makes sense to me that both coefficients should bet set to zero. So I think this is a bug fix
There was a problem hiding this comment.
By setting clf.sigma_ = np.array([]), the later call to clf.predict(X, return_std=True) would return a different result for the std when compared to master. Is this okay?
There was a problem hiding this comment.
yes, it's a bug fix. Sigma should be nothing when keep_lambda is all false.
You might want to slightly delay your review @thomasjpfan , I'm going to update this for the n_features > n_samples case
| @pytest.mark.parametrize('seed', range(100)) | ||
| def test_ard_accuracy_on_easy_problem(seed): |
There was a problem hiding this comment.
The test should now run properly on many different seeds (there was a 7% failure rate before), and with a much higher precision
sklearn/linear_model/_bayes.py
Outdated
| gram = np.dot(X_keep.T, X_keep) | ||
| eye = np.eye(gram.shape[0]) | ||
| sigma_inv = lambda_[keep_lambda] * eye + alpha_ * gram | ||
| sigma_ = pinvh(sigma_inv) |
There was a problem hiding this comment.
So that's basically the main fix. Instead of relying on the woodbury formula, we just directly invert the matrix (called S_N in the ref). This seems to be much more stable and also much faster because here we invert a matrix that is n_features x features whereas in master the inversion is on a n_samples x n_samples.
Strangely enough, this somewhat reverts a commit from 10 years ago b2d0a77, whose goal was to make things faster but I'm not sure it did. @vmichel if you're still around your input would be greatly appreciated!
There was a problem hiding this comment.
could it be that numpy operations are now much faster?
There was a problem hiding this comment.
I doubt it, the bottleneck is the matrix inversion here
| from ..base import RegressorMixin | ||
| from ..utils.extmath import fast_logdet | ||
| from ..utils.fixes import pinvh | ||
| from scipy.linalg import pinvh |
There was a problem hiding this comment.
We don't need the scipy's backport to we can remove it now
|
Pinging @rth @jnothman @thomasjpfan @glemaitre since you took a look at the previous PR. Thanks! |
adrinjalali
left a comment
There was a problem hiding this comment.
It'd be nice to get this one in. It looks good to me, I think.
doc/whats_new/v0.23.rst
Outdated
| :pr:`16266` by :user:`Rushabh Vasani <rushabh-v>`. | ||
|
|
||
| - |Fix| |Efficiency| :class:`linear_model.ARDRegression` is more stable and | ||
| much faster. It can now scale to hundreds of thousands of samples. |
There was a problem hiding this comment.
you could also mention https://github.com/scikit-learn/scikit-learn/pull/16849/files#r403760030 here maybe?
sklearn/linear_model/_bayes.py
Outdated
| gram = np.dot(X_keep.T, X_keep) | ||
| eye = np.eye(gram.shape[0]) | ||
| sigma_inv = lambda_[keep_lambda] * eye + alpha_ * gram | ||
| sigma_ = pinvh(sigma_inv) |
There was a problem hiding this comment.
could it be that numpy operations are now much faster?
|
I've updated the PR so that the woodbury formula is still used when CC @amueller |
|
Since the results of the two methods are not the same, should it be exposed as kind of an |
|
they are the same, we have a test for that |
|
Here's where I'm confused: You changed the solver block, and added the |
|
The only thing I added about
Yes (it's not strictly speaking a solver it's more of a way of computing the inverse of a specific matrix. In the end, the same
It's not just a performance improvement, it's also a stability improvement. It's more stable to invert |
|
I'm happy with this one, another review maybe? @thomasjpfan wanna have another look? |
|
Sorry to bother if I am wrong. However, I have noticed that the plot in the documentation of ARD is different with sklearn 0.23.X compared to version 0.22.X, and I wonder if this commit introduced the issue (maybe it is the expected behavior). cf. #20740 or directly the last plot shown in this example in the documentation and embedded in my post below. |
|
I open a new issue such that this problem should be investigated. |

This should fix #15186
Closes #16102
This comes with a significant speed up: on master the estimator can only scale to a few hundreds of samples. This PR can comfortably fit 100k samples in about 1 sec.