irlba Convergence criteria

I work on Soft-Impute and Soft-SVD algorithms described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.

They proposed to keep track the change of Frobenius norm in order to track convergence. I believe this could be better criteria than just tracking change in singular values: screen shot 2017-11-26 at 4 38 40 pm

I've created function which calculates this delta in my reco package. If you will find this approach useful for irlba, let me know, I will send PR.

Nov 26 '17 12:11 dselivanov

Dear Dmitriy,

I apologize for the latency, lots going on.

Pull requests are always welcome. I have some notes for you to consider:

I'm not sure if you're referring to irlba or svdr or both. Indeed svdr uses only change in singular values as you mention above. irlba however looks at two convergence criteria: convergence to an invariant subspace as well as change int the singular values, so it's a bit more subtle than you mention above in the irlba case.
You may be right, this may be a better overall criterion but we need to investigate that first (maybe in a branch). The original Baglama/Reichel criterion of subspace convergence turned out to be a bit too weak in practice. After some deliberation we added the change in singular values to address convergence issues typified by the tprolate example available in the svdr function examples. The reason singular values themselves were added as to subspace convergence was that, because the largest singular values converge quickly, this approach ends up being sensitive to poor convergence in the smallest desired values -- a good feature. The problem is that the method may quickly converge to an invariant subspace but the wrong one, the addition of the svtol check helps prevent that.

WIth that background, I do think your approach should also be investigated and might be better. We just need to see. I look forward to experimenting with this!

Thanks so much for helping with this.

Best,

Bryan

Addendum: note that we're also working (slowly because I'm slow) with @erichson on a kind of hybrid Lanczos/power method. This criterion might be a great fit there.

Dec 10 '17 02:12 bwlewis

Just saw this here, but I'm going to drop my $0.02 hoping it may be useful to someone.

I work on matrix factorization by alternating least squares. I've found that Frobenius norm is actually quite expensive to compute relative to other methods that work far faster and just as well. To calculate the Frobenius norm you need to multiply out a potentially very large set of model matrices (U and V) involving potentially a very large number of flops.

Instead, realize the Frobenius norm is directly dependent on the change in the models themselves (U and V in this case). Simply by measuring the ratio of explained to unexplained variance between consecutive iterations in U and V, we get a very good approximation of the Frobenius norm for the change in the multiplied-out model. The change in the product UV across iterations is actually almost exactly linear with the change in U and V. Of course, other measures can be used such as Euclidean or Cosine distance, but I have found R-squared to be the most stable.

In NMF I don't need to worry about a diagonal (unless I explicitly diagonalize), but in SVD you might account for the presence of a diagonal by weighting the correlation between columns in U (and/or V) by the diagonal. This way you don't bias the convergence criterion away from the first singular values.

Mar 31 '21 11:03 zdebruine