Move Linux GPU CI pipeline to A10 #23235

snnn · 2025-01-01T03:41:28Z

Move Linux GPU CI pipeline to A10 machines which are more advanced.
Retire onnxruntime-Linux-GPU-T4 machine pool.
Disable run_lean_attention test because the new machines do not have enough shared memory.

skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory
[E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument

snnn · 2025-01-02T23:56:32Z

/azp run Windows ARM64 QNN CI Pipeline

azure-pipelines · 2025-01-02T23:56:41Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

Files not reviewed (1)

tools/ci_build/github/linux/build_cuda_ci.sh: Language not supported

Comments suppressed due to low confidence (1)

tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml:140

[nitpick] The pool name 'Onnxruntime-Linux-A10-24G' uses a different capitalization pattern compared to the previous 'onnxruntime-Linux-GPU-T4'. Ensure consistent naming conventions.

pool: Onnxruntime-Linux-A10-24G

Move Linux GPU CI pipeline to A10 machines which are more advanced. Retire onnxruntime-Linux-GPU-T4 machine pool. Disable run_lean_attention test because the new machines do not have enough shared memory. ``` skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory [E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument ```

Changming Sun added 2 commits January 1, 2025 03:41

Move Linux GPU CI pipeline to A10

b9682a3

Update GPU pool to onnxruntime-Linux-GPU-A10-12G

577e398

jchen351 previously approved these changes Jan 2, 2025

View reviewed changes

update

b80c913

snnn dismissed jchen351’s stale review via b80c913 January 2, 2025 20:06

Update pool name in CI pipeline

c103782

Changming Sun added 2 commits January 3, 2025 01:29

Merge remote-tracking branch 'origin/main' into snnn/retire_t4

8a5f257

update

c9e4757

snnn requested review from Copilot and tianleiwu January 3, 2025 03:25

Copilot AI reviewed Jan 3, 2025

View reviewed changes

tianleiwu approved these changes Jan 4, 2025

View reviewed changes

snnn merged commit b7ef81a into main Jan 5, 2025
94 of 96 checks passed

snnn deleted the snnn/retire_t4 branch January 5, 2025 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move Linux GPU CI pipeline to A10 #23235

Move Linux GPU CI pipeline to A10 #23235

Uh oh!

snnn commented Jan 1, 2025 •

edited

Loading

Uh oh!

snnn commented Jan 2, 2025

Uh oh!

azure-pipelines bot commented Jan 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Move Linux GPU CI pipeline to A10 #23235

Move Linux GPU CI pipeline to A10 #23235

Uh oh!

Conversation

snnn commented Jan 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn commented Jan 2, 2025

Uh oh!

azure-pipelines bot commented Jan 2, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

snnn commented Jan 1, 2025 •

edited

Loading