Add KleidiAI gemm kernels#2000
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2000
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 5 PendingAs of commit 112c07a with merge base 83d58e3 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D72179835 |
Summary:
This PR pulls in two new KleidiAI kernels:
* kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
* kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)
and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:
```
TEST(
test_linear_8bit_act_xbit_weight,
matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
test_linear_8bit_act_xbit_weight_kleidiai<
matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}
```
The exisitng testing code (still exists for more coverage) depended on code generation.
Reviewed By: Jack-Khuu
Differential Revision: D72179835
94178ae to
a7f384f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D72179835 |
|
@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2 similar comments
|
@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
This PR pulls in two new KleidiAI kernels:
* kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
* kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)
and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:
```
TEST(
test_linear_8bit_act_xbit_weight,
matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
test_linear_8bit_act_xbit_weight_kleidiai<
matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}
```
The exisitng testing code (still exists for more coverage) depended on code generation.
Reviewed By: Jack-Khuu
Differential Revision: D72179835
a3dcfb8 to
112c07a
Compare
|
@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Add KleidiAI gemm kernels (#2000) Summary: This PR pulls in two new KleidiAI kernels: * kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV) * kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM) and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition: ``` TEST( test_linear_8bit_act_xbit_weight, matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) { test_linear_8bit_act_xbit_weight_kleidiai< matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>(); } ``` The exisitng testing code (still exists for more coverage) depended on code generation. Reviewed By: Jack-Khuu Differential Revision: D72179835
Add KleidiAI gemm kernels (#2000) Summary: This PR pulls in two new KleidiAI kernels: * kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV) * kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM) and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition: ``` TEST( test_linear_8bit_act_xbit_weight, matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) { test_linear_8bit_act_xbit_weight_kleidiai< matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>(); } ``` The exisitng testing code (still exists for more coverage) depended on code generation. Reviewed By: Jack-Khuu Differential Revision: D72179835
Summary:
This PR pulls in two new KleidiAI kernels:
and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:
The exisitng testing code (still exists for more coverage) depended on code generation.
Reviewed By: Jack-Khuu
Differential Revision: D72179835