Replace Linux A10 pools with A100 #23547

snnn · 2025-01-31T01:09:22Z

Description

The latest driver for Linux A10 machines is 535.
The latest driver for Windows A10 machines is 550.
We have some unused A100 quota. If we can replace the A10 machines with A100 machines, we might be able to upgrade our CUDA from 12.2 to 12.4.

Motivation and Context

### Description The latest driver for Linux A10 machines is 535. The latest driver for Windows A10 machines is 550. We have some unused A100 quota. If we can replace the A10 machines with A100 machines, we might be able to upgrade our CUDA from 12.2 to 12.4.

Update GQA benchmark to support bfloat16 and default to testing the first configuration (fast mode). Note that test_sparse_attention.py was removed in #23547. It is referenced by the benchmark script, so I add it back and disable the test in pipeline mode. Example output from H200 GPU: ``` prompt-sm90-Llama3-8B-b1-h32_8x128-float16: sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV 0 16.0 0.781751 0.571226 1 32.0 0.893813 0.684198 2 64.0 1.434056 1.589263 3 128.0 1.142192 1.681969 4 256.0 1.503483 2.225498 5 512.0 1.045732 1.878660 6 1024.0 2.334924 0.916745 7 2048.0 2.229924 3.001290 8 4096.0 4.309678 3.198855 9 8192.0 7.932211 7.910411 token-sm90-Llama3-8B-b1-h32_8_d128-float16: past_sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV 0 16.0 1.751966 0.780081 1 32.0 1.302806 0.043939 2 64.0 2.301024 2.207282 3 128.0 2.294556 3.010107 4 256.0 2.931330 1.781768 5 512.0 1.210220 2.799579 6 1024.0 2.767142 2.660434 7 2048.0 1.420229 0.091433 8 4096.0 0.860655 0.801022 9 8191.0 0.749525 0.820858 prompt-sm90-Llama3-8B-b1-h32_8x128-bfloat16: sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV 0 16.0 1.085427 0.666664 1 32.0 1.714795 0.931262 2 64.0 1.729093 1.438733 3 128.0 1.071263 2.486135 4 256.0 1.957349 1.342417 5 512.0 1.159680 1.591321 6 1024.0 0.743702 2.035150 7 2048.0 1.452736 1.788801 8 4096.0 4.029917 4.041565 9 8192.0 7.934485 7.931600 token-sm90-Llama3-8B-b1-h32_8_d128-bfloat16: past_sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV 0 16.0 0.044354 0.043983 1 32.0 0.040715 0.044061 2 64.0 0.045586 0.044071 3 128.0 0.062204 0.061418 4 256.0 0.074764 4.874854 5 512.0 2.472094 2.102259 6 1024.0 4.911269 1.396149 7 2048.0 4.898032 1.684034 8 4096.0 2.523432 2.192279 9 8191.0 1.651366 3.427370 ```

Changming Sun and others added 10 commits January 22, 2025 01:02

Move Linux GPU CI pipeline to A100

e96c994

Merge remote-tracking branch 'origin/main' into snnn/replace_pool

d426634

update

b72380c

Merge remote-tracking branch 'origin/main' into snnn/replace_pool

4a24cce

stash

42b2eaa

Merge remote-tracking branch 'origin/main' into snnn/replace_pool

b5c53ab

Add debug information

92da576

Merge branch 'main' into snnn/replace_pool

3eae233

revert

0ebf8a5

update

8143277

tianleiwu approved these changes Feb 19, 2025

View reviewed changes

snnn marked this pull request as ready for review February 19, 2025 21:03

snnn merged commit 4dff07d into main Feb 19, 2025
96 of 98 checks passed

snnn deleted the snnn/replace_pool branch February 19, 2025 21:04

tianleiwu mentioned this pull request Jan 2, 2026

Update GQA benchmark to support bfloat16 #26898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace Linux A10 pools with A100 #23547

Replace Linux A10 pools with A100 #23547

Uh oh!

snnn commented Jan 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Replace Linux A10 pools with A100 #23547

Replace Linux A10 pools with A100 #23547

Uh oh!

Conversation

snnn commented Jan 31, 2025

Description

Motivation and Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants