[CI] Use cudagraph for benchmarks by hjjq · Pull Request #419 · hidet-org/hidet

hjjq · 2024-01-19T20:20:12Z

No description provided.

I found the stride calculation for the cublas batched matmul is incorrect when we enable the parallel k optimization. And the wrong strides lead to the mismatching results in Llama inference. It's a bit complex to convert the matmuls in parallel k optimzation to a canonicalized batched matmul in cublas, so I just disabled it. Specifically, after splitting the K dimension, the extent in K dimension for each partition might be unequal, and we have to check if the K dimension is out-of-bound or not. But, actually, we didn't now. In addition, this complicates the conversion to a normal batched matmul in cublas. closes #419 --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-134.us-east-2.compute.internal> Co-authored-by: xiaocenxiaocen <xiao.zhang@centml.ai>

fix

be7001f

hjjq merged commit be68a74 into hidet-org:main Jan 19, 2024

hjjq deleted the fix branch January 19, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Use cudagraph for benchmarks#419

[CI] Use cudagraph for benchmarks#419
hjjq merged 1 commit intohidet-org:mainfrom
hjjq:fix

hjjq commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hjjq commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant