blis icon indicating copy to clipboard operation
blis copied to clipboard

BLAS-like Library Instantiation Software Framework

Results 138 blis issues
Sort by recently updated
recently updated
newest added

Field, Hi. Here are a few new commits for the Ampere branch. The commit log for a major new feature we added is excessively long to give you a chance...

This PR adds a valid Arm Neoverse N1 compilation target using Armv8 kernels. It creates the appropriate registry information and can autodetect a N1 cpu.

Basically it's a templated re-implementation of current column-major d6x8's transpose. Shall I test out `gemmtrsm`?

This initial work via opt-in configure option enables offloading of some sgemm, dgemm, cgemm, zgemm operations to AMD GPUs via AMD's rocBLAS. It hence requires a working ROCm software stack...

We are using BLIS in [spaCy](https://github.com/explosion/spaCy) and have encountered access violations in CI when running on Windows, since updating to BLIS 0.9.0. I have tried to reproduce this issue with...

- testsuite-run-fast fails on POWER9 and POWER10 with error message: `4440 Segmentation fault ./test_libblis.x -g ./testsuite/input.general.fast -o ./testsuite/input.operations.fast > output.testsuite` - POWER7 dgemm microkernel includes altivec.h, which defines type `bool`....

Let call `macro-block` an MC-by-NC block, which is traditionally processed by a call to `macro-kernel` Let call `edge-macro-block` either an ME-by-NC or an MC-by-NE block, where `0 < ME <...

bug

While working with a client, we noticed that our software running with blis was slower on a recent processor (Intel rocket lake) than it was on an older processor (Haswell)....

This branch contains preliminary support for a new `.c_next` field within the `auxinfo_t` struct. It is fully implemented for `gemm`. Caveats: - The "wrap-around" address computation for the edge cases...

enhancement

Details: - In some multi-threading schemes, JR_NT and IR_NT may produce idle threads not performing any computation. - This commits detect such situation and implement a collapse of JR/IR loops....