Updates the documentation of xGEMV and xGBMV related to when M=0 and N=0#843
Updates the documentation of xGEMV and xGBMV related to when M=0 and N=0#843langou merged 2 commits intoReference-LAPACK:masterfrom
Conversation
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## master #843 +/- ##
=========================================
Coverage 0.00% 0.00%
=========================================
Files 1908 1918 +10
Lines 186962 188614 +1652
=========================================
- Misses 186962 188614 +1652 ☔ View full report in Codecov by Sentry. |
|
Is this behaviour checked/enforced by some test? It is a subtle change that could slip through when vendors upgrade their LAPACK version and their implementation does not follow reference-LAPACK. |
|
The most recent commit also updates We have:
|
Thanks for asking! Yes, we currently check this behavior in (testBLAS). In fact, multiple BLAS implementations (all libraries I tested so far in testBLAS) satisfy this behavior. We do not have such checks in the LAPACK test suite currently. |
Closes #248.
Closes #788.
The documentation of GEMV (and GBMV) misses a boundary case, namely when M=0 or N=0. There are two possibilities for the solution of this issue:
The behavior of GEMV (and GBMV) does not seem to match the behavior of GEMM, SYRK, SYR2K, HERK and HER2K regarding matrices with zero input sizes. In GEMM, for instance, C gets updated even if the internal dimension K is zero. In GEMV (and GBMV), if the internal dimension (M or N) is zero, Y does not get updated. The first solution is, therefore, to change the condition in GEMV (and GBMV) to be compatible to the lv3 BLAS.
The behavior of GEMV (and GBMV) is aligned to its original proposal (https://dl.acm.org/doi/10.1145/42288.42291) that says:
Note that it is permissible to call the routines with M or N = 0, in which case the routines exit immediately without referencing their vector or matrix arguments.Moreover, tests in (testBLAS) confirm that several BLAS implementations conform with this rule.
This PR proposes a fix for the issue using (2), which preserves the current behavior and is compatible to several BLAS implementations.