Conversation
fix W[ii, 0]
jeremiedbb
left a comment
There was a problem hiding this comment.
Makes sense since n_tasks is usually small. BLAS libraries try to adjust their threading scheme according to the sizes of the matrices involved but I think there's still a lot of work to do in that direction.
sklearn/linear_model/_cd_fast.pyx
Outdated
|
|
||
| # if np.sum(W[:, ii] ** 2) != 0.0: # can do better | ||
| if _nrm2(n_tasks, W_ptr + ii * n_tasks, 1) != 0.0: | ||
| if (W[0, ii] != 0.): # faster than testing full col norm |
There was a problem hiding this comment.
It's not equivalent.
- before: equivalent to test if all the W[:, ii] == 0. If so, no need to compute the dot. Makes sense
- after: only check if W[0, ii] == 0.
I think you miss some computations here.
There was a problem hiding this comment.
you're right. It's something we haven't cleaned up.
There was a problem hiding this comment.
it's actually on purpose as it's unlikely that you have w_ii[0]==0 and some values != in w_ii[1:] so it should do as well and avoid computations
There was a problem hiding this comment.
What really needs to be avoided is the computation of the norm of W[:, ii]. I think a safe and not too heavy way to do the check is just a loop over W[:, ii], (as in https://github.com/mathurinm/celer/blob/master/celer/multitask_fast.pyx#L360)
There was a problem hiding this comment.
it's unlikely
Unlikely is not always ? If it can happen that only w_ii[0]==0, the computation would be wrong. I like @mathurinm's idea.
|
I changed the doctest because it was a pathological case with a duplicated feature. It failed because of numerical errors (1e-16 instead of exact 0) |
|
The test uses an alpha which is superior to alpha_max: So you can (and should, up to numerical errors) get a duality gap of 0 even with 0 iteration. I suggest changing tol to -1 in the test? Or using a smaller alpha. |
|
what is weird is why we did not have this issue before as the new code
should do literally the same thing
up to numerical errors. Any idea?
… |
|
The current test is wrong in master and passes only because of numerical errors: we fit with alpha > alpha_max so we should get a duality gap of 0, and no convergence warning. In master, we get a gap of 1e-13 so a warning is raised. My belief is that in the PR there is no numerical error and we get exactly 0 |
|
a benchmark to confirm the speed gains: code https://gist.github.com/agramfort/ca54cc1bc12a37d7a426a7799cc236ce you can see that the speed gains are not minor. |
|
ready for review @jeremiedbb et al :) maybe we can even squeeze this in before the release ... |
sklearn/linear_model/_cd_fast.pyx
Outdated
| floating[::1, :] X, | ||
| floating[::1, :] Y, |
There was a problem hiding this comment.
We can't squeeze that into the release. We need to have cython 3.0 as min version.
| # check that the model fails to converge | ||
| with pytest.warns(ConvergenceWarning): | ||
| MultiTaskElasticNet(max_iter=1, tol=0).fit(X, y) | ||
| MultiTaskElasticNet(alpha=0.001, max_iter=1, tol=0).fit(X, y) |
There was a problem hiding this comment.
why is changing alpha necessary here ? It converges with only 1 iteration with default alpha ?
There was a problem hiding this comment.
the default alpha is superior to alpha_max for this pair (X, y). So a duality gap of 0 (up to numerical errors) is reached in 0 iterations.
There was a problem hiding this comment.
I must be misunderstanding something but the test does not fail on master. How can it fail with this PR ?
There was a problem hiding this comment.
It is tricky.
The test should fail on master. It passes only because of numerical errors (a duality gap of 1e-13 instead of 0 is returned).
I believe it fails on the PR because a duality gap of 0 is returned (which is the correct value)
Run:
import numpy as np
from sklearn.linear_model import MultiTaskElasticNet
random_state = np.random.RandomState(0)
X = random_state.standard_normal((1000, 500))
y = random_state.standard_normal((1000, 3))
clf = MultiTaskElasticNet(max_iter=1, tol=0).fit(X, y)
print(clf.dual_gap_) # should be 0, is 1e-13
print(np.linalg.norm(clf.coef_)) # we have converged. The initialization is already the solution bc alpha >= alpha_max
There was a problem hiding this comment.
Thanks for the clarification. tol=0 is rarely a good idea because it's subject to numerical precision issues. We can change the tol or the alpha. But in either case could you add a comment to explain that ?
There was a problem hiding this comment.
IMO It is more logical to use tol=-1 (eg, the value of alpha we have used could still be >= alpha_max (it is not))
Where should I put the comment?
There was a problem hiding this comment.
just above this line, when creating the model instance.
|
I'm happy with the current state of this PR but as I said in a comment, we can't merge the switch to memoryviews until we bump our cython min version to 0.3 (which is still an alpha). So we can either wait a little bit which means it will be for the next release, or put back the ndarrays and we can merge it for this release. I'm fine with both, your call. |
|
Putting back the ndarrays only means changing the signature of the cython function, we could still use |
sklearn/linear_model/_cd_fast.pyx
Outdated
| @@ -622,8 +622,8 @@ def enet_coordinate_descent_gram(floating[::1] w, | |||
|
|
|||
| def enet_coordinate_descent_multi_task(floating[::1, :] W, floating l1_reg, | |||
There was a problem hiding this comment.
@jeremiedbb I already see the syntax floating[::1, :] W in master, did I miss something?
There was a problem hiding this comment.
We can use memoryviews when we are sure that the array has been created in scikit-learn because then we know it's not read-only. When it's a user provided array (X, y), we don't have that guarantee and fused typed memoryviews don't work with read only arrays.
I'm not sure about that, if I remember correctly, it's slower to access ndarray elements. So it would mean keeping the pointers. We'd have to rerun the benchmarks to make sure. |
|
I reverted to np.ndarray for X and Y. @agramfort can you check that the benchmarks are still favorable? |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks for the PR, made a first pass
| >>> from sklearn import linear_model | ||
| >>> clf = linear_model.MultiTaskLasso(alpha=0.1) | ||
| >>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]]) | ||
| >>> clf.fit([[0, 1], [1, 2], [2, 4]], [[0, 0], [1, 1], [2, 3]]) |
There was a problem hiding this comment.
was this change needed?
just trying to understand
There was a problem hiding this comment.
Having a duplicated feature resulted in the second coefficient being 0, but with numerical errors it was 1e-16
sklearn/linear_model/_cd_fast.pyx
Outdated
| # _ger(RowMajor, n_samples, n_tasks, 1.0, | ||
| # &X[0, ii], 1, | ||
| # &w_ii[0], 1, &R[0, 0], n_tasks) | ||
| # Using Blas Level1 and for loop for avoid slower threads |
There was a problem hiding this comment.
| # Using Blas Level1 and for loop for avoid slower threads | |
| # Using Blas Level1 and for loop to avoid slower threads |
sklearn/linear_model/_cd_fast.pyx
Outdated
|
|
||
| # if np.sum(w_ii ** 2) != 0.0: # can do better | ||
| if _nrm2(n_tasks, wii_ptr, 1) != 0.0: | ||
| # if (w_ii[0] != 0.): # faster than testing full norm for non-zeros, yet unsafe |
There was a problem hiding this comment.
I'm discovering the code so I lack context, but I'm not sure what this is supposed to mean.
In general I feel like "We don't do X because it wouldn't work" is mostly confusing because one would just wonder why we would even want to do X in the first place.
There was a problem hiding this comment.
I agree. I think we can even remove the 2 comments.
if np.sum(w_ii ** 2) != 0.0: # can do better -> we do better so no need for that any more
# faster than testing full norm for non-zeros, yet unsafe -> nicola's argument
sklearn/linear_model/_cd_fast.pyx
Outdated
| X_ptr + ii * n_samples, 1, | ||
| wii_ptr, 1, &R[0, 0], n_tasks) | ||
|
|
||
| # Using Blas Level2: |
There was a problem hiding this comment.
Should this whole comment block be indented to the left now?
sklearn/linear_model/_cd_fast.pyx
Outdated
| # Using BLAS Level 2: | ||
| # _gemv(RowMajor, Trans, n_samples, n_tasks, 1.0, &R[0, 0], | ||
| # n_tasks, &X[0, ii], 1, 0.0, &tmp[0], 1) | ||
| # Using BLAS Level 1 (faster small vectors like here): |
There was a problem hiding this comment.
| # Using BLAS Level 1 (faster small vectors like here): | |
| # Using BLAS Level 1 (faster for small vectors like here): |
sklearn/linear_model/_cd_fast.pyx
Outdated
| # if (W[0, ii] != 0.): # faster than testing full col norm, but unsafe | ||
| # Using numpy: |
|
@mathurinm your latest changes LGTM but can you rerun my bench on this version to see what is the speedup with the last version of the code? |
|
On my machine the gain for 50 tasks is not always present. |
|
great. good to go from my end. needs 2nd approval |
|
@NicolasHug thank you for the review, does it look good to you now? |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks @mathurinm @agramfort
Maybe we could add a TODO comment and/or open an issue about using views instead of numpy arrays in these places, since it does seem that the benchmarks with the views was even faster
Also this probably needs a what's new?
I'm not sure what to think about the small regressions with n_samples=100 and n_tasks=50. I guess if the absolute running time is pretty slow already, that's still OK?
|
done @NicolasHug |
| random noise to the target. This might help with stability in some edge | ||
| cases. :pr:`15179` by :user:`angelaambroz`. | ||
|
|
||
| - |Efficiency| Speed up :class:`linear_model.MultiTaskLasso`, |
There was a problem hiding this comment.
I'm not sure this can make it to 0.23
@adrinjalali this is ready for a merge, so your choice basically ;)
|
For completeness, here are the old (aster) running times on my machine: |
ogrisel
left a comment
There was a problem hiding this comment.
Also +1 for be part of 0.23 if not too late @adrinjalali. Merging and tagging #17010 but feel free to let us know if you would rather move to 0.24 instead.
|
Thank you @mathurinm! |
|
I was hoping this one would get in, I intentionally hadn't tagged yet :) #17010 |
This is for now WIP.
A good PR for @jeremiedbb as it's related to findings in kmeans.
Basically doing some Blas Level 2 calls in enet_coordinate_descent_multi_task
slows things down as it's called on small vectors.
It needs more tests / benchmarks but I have a 10x speed up on a relevant dataset for my research.
done with @mathurinm