Skip to content

Add KleidiAI gemm kernels#2000

Merged
metascroy merged 1 commit into
pytorch:mainfrom
metascroy:export-D72179835
Apr 3, 2025
Merged

Add KleidiAI gemm kernels#2000
metascroy merged 1 commit into
pytorch:mainfrom
metascroy:export-D72179835

Conversation

@metascroy

Copy link
Copy Markdown
Contributor

Summary:
This PR pulls in two new KleidiAI kernels:

  • kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
  • kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)

and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:

TEST(
    test_linear_8bit_act_xbit_weight,
    matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
  test_linear_8bit_act_xbit_weight_kleidiai<
      matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}

The exisitng testing code (still exists for more coverage) depended on code generation.

Reviewed By: Jack-Khuu

Differential Revision: D72179835

@pytorch-bot

pytorch-bot Bot commented Apr 1, 2025

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2000

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 5 Pending

As of commit 112c07a with merge base 83d58e3 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Apr 1, 2025
@facebook-github-bot

Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D72179835

metascroy added a commit to metascroy/ao that referenced this pull request Apr 2, 2025
Summary:

This PR pulls in two new KleidiAI kernels:
* kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
* kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)

and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set.  It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:

```
TEST(
    test_linear_8bit_act_xbit_weight,
    matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
  test_linear_8bit_act_xbit_weight_kleidiai<
      matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}
```

The exisitng testing code (still exists for more coverage) depended on code generation.

Reviewed By: Jack-Khuu

Differential Revision: D72179835
@facebook-github-bot

Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D72179835

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:

This PR pulls in two new KleidiAI kernels:
* kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
* kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)

and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set.  It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:

```
TEST(
    test_linear_8bit_act_xbit_weight,
    matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
  test_linear_8bit_act_xbit_weight_kleidiai<
      matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}
```

The exisitng testing code (still exists for more coverage) depended on code generation.

Reviewed By: Jack-Khuu

Differential Revision: D72179835
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@metascroy metascroy merged commit 76ec450 into pytorch:main Apr 3, 2025
jainapurva pushed a commit that referenced this pull request Apr 8, 2025
Add KleidiAI gemm kernels (#2000)

Summary:

This PR pulls in two new KleidiAI kernels:
* kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
* kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)

and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set.  It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:

```
TEST(
    test_linear_8bit_act_xbit_weight,
    matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
  test_linear_8bit_act_xbit_weight_kleidiai<
      matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}
```

The exisitng testing code (still exists for more coverage) depended on code generation.

Reviewed By: Jack-Khuu

Differential Revision: D72179835
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
Add KleidiAI gemm kernels (#2000)

Summary:

This PR pulls in two new KleidiAI kernels:
* kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV)
* kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM)

and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set.  It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition:

```
TEST(
    test_linear_8bit_act_xbit_weight,
    matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) {
  test_linear_8bit_act_xbit_weight_kleidiai<
      matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>();
}
```

The exisitng testing code (still exists for more coverage) depended on code generation.

Reviewed By: Jack-Khuu

Differential Revision: D72179835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants