-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Integrate high-performance x64 gemm library to MLAS #17669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
|
/azp run Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
|
/azp run Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 8 pipeline(s). |
|
/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 7 pipeline(s). |
|
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
|
Azure Pipelines successfully started running 9 pipeline(s). |
|
Thanks Louyu! |
…icrosoft#19015) Allow MatMulNBits `accuracy_level` attribute (added in microsoft#17669) to be set to a particular value when the model is quantized.
### Description <!-- Describe your changes. --> Revert PR#19016 microsoft/onnxruntime#19016 Revert PR#17669 microsoft/onnxruntime#17669
### Description <!-- Describe your changes. --> Revert PR#19016 microsoft/onnxruntime#19016 Revert PR#17669 microsoft/onnxruntime#17669
Description
Improve MLAS to support high-performance x64 INT4 kernels
Motivation and Context
Tasks
Benchmark
Ubuntu 20.22 + Intel(R) Xeon(R) Platinum 8480+ 56 cores
Reference:
Win11+12900K 8 cores: