Conversation
|
Could you fix for the interface compatibility? |
|
Is it possible to add other return values with no or very few overheads? |
Is this the return other values?
I added the other return values, and now The new overhead comes from calculating the sum of square of the residuals (L-2 norm): if rank != n or m <= n:
resids = cupy.array([], dtype=a.dtype)
elif b.ndim > 1:
# note that this should be the same (and faster?) than the next couple of lines
# e = b - core.dot(a, x)
# resids = cupy.diagonal(core.dot(e.T, e))
k = b.shape[1]
resids = cupy.zeros(k, dtype=a.dtype)
for i in range(k):
e = b[:, i] - core.dot(a, x[:, i])
resids[i] = core.dot(e.T, e)
else:
e = b - core.dot(a, x)
resids = core.dot(e.T, e).reshape(-1) |
| cupy.testing.assert_allclose(rank_cpu, rank_gpu, atol=1e-3) | ||
| cupy.testing.assert_allclose(s_cpu, s_gpu, atol=1e-3) | ||
| cupy.testing.assert_array_equal(a_gpu_copy, a_gpu) | ||
| cupy.testing.assert_array_equal(b_gpu_copy, b_gpu) |
There was a problem hiding this comment.
Does not this implementation ensure that the result of cupy.lstsq is equal to that of numpy.lstsq? In other words, it ensures only ||ax-b||^2 are equal to each other?
If so, could you fix to check if ||ax-b||^2 is close?
There was a problem hiding this comment.
I check if ||ax-b||^2 is close with cupy.testing.assert_allclose(resids_cpu, resids_gpu, atol=1e-3)
I'm not sure about atol=1e-3, I picked this because it was the same value for tests of cupy.linalg.pinv which also uses svd.
There was a problem hiding this comment.
Does not this implementation ensure that the result (the first element of the return values) of cupy.lstsq is equal to that of numpy.lstsq?
There was a problem hiding this comment.
The implementation ensures that each element in the result is close, but not equal. They won't be equal in a programming, because the backends are different.
Edit: To clarify, the implementation checks that all elements in the results of numpy are close to cupy.
There was a problem hiding this comment.
Why @condition.retry(10) at L157 is needed? One of the result value is often distant from the expected value?
There was a problem hiding this comment.
Why @condition.retry(10) at L157 is needed? One of the result value is often distant from the expected value?
Long response #2165 (comment)
Changes: 1. Raise numpy.linalg.LinAlgError on invalid dimmensions. 2. Remove for loop for calculation of residuals in k dimmension. 3. Clarify tests with comments and new test names. 4. Modify tests to look for numpy.linalg.LinAlgError.
Sometimes test fail, in particular import numpy as np
a = np.array([[2., 5., 1., 2.],
[2., 2., 3., 8.],
[4., 8., 1., 8.],
[1., 3., 5., 1.]], dtype=np.float32)
b = np.array((9., 6., 0., 3.), dtype=np.float32)where lstsq is solved in the following manner x, resid, rank, s = np.linalg.lstsq(a, b)produces (x_cpu == numpy.linalg.lstsq solution, x_gpu == cupy.linalg.lstsq solution) AssertionError:
Not equal to tolerance rtol=1e-07, atol=0.001
Mismatch: 50%
Max absolute difference: 0.0032959
Max relative difference: 1.6645674e-05
x_cpu: array([198. , -64.71429 , 6.857143, -35.142857], dtype=float32)
x_gpu: array([198.0033 , -64.71534 , 6.857244, -35.14343 ], dtype=float32)Should I make the tests deterministic to avoid such failures? this could be done by supplying a random seed for each configuration of |
|
Yes, It seems preferred to make the tests deterministic. |
|
@asi1024 Thanks for your recommendations.
I remove the retry condition, and use fixed random seeds to generate the arrays now.
The tests have been reworked to generate a singular matrix. If a singular matrix was generated, the tests no longer check if the first return element is close to NumPy solution. All other return elements are still checked. |
|
Jenkins, test this please. |
|
Successfully created a job for commit cc5012c: |
|
Jenkins CI test (for commit cc5012c, target branch master) succeeded! |
|
LGTM. Thank you for the PR! |
Adds
cupy.linalg.lstsqThis solves a least squares problem using SVD similar to
numpy.linalg.lstsq.This only returns the least-squares solution, while
numpy.linalg.lstsqreturns the residuals, rank, and singular values in addition to the least-squares solution.If it's necessary to have 100% return syntax match to
numpy.linalg.lstsq, I can modify this pr to include residuals, rank, and singular values.Closes #1273
Edit 04-28-2019 22:54 EDT.
I have a benchmark comparing the performance of this
cupy.linalg.lstsqvsnumpy.linalg.lstsq, where CuPy was about six times faster on an AMD FX-8350 with NVIDIA Titan Xp for 6,000,000 data points. I'd be curious if this was true on different hardware configurations. The code to run the benchmark is here, and was run in the following order: