Skip to content

[CI] Use cudagraph for benchmarks#419

Merged
hjjq merged 1 commit intohidet-org:mainfrom
hjjq:fix
Jan 19, 2024
Merged

[CI] Use cudagraph for benchmarks#419
hjjq merged 1 commit intohidet-org:mainfrom
hjjq:fix

Conversation

@hjjq
Copy link
Copy Markdown
Collaborator

@hjjq hjjq commented Jan 19, 2024

No description provided.

@hjjq hjjq merged commit be68a74 into hidet-org:main Jan 19, 2024
@hjjq hjjq deleted the fix branch January 19, 2024 20:21
vadiklyutiy pushed a commit that referenced this pull request Dec 19, 2024
I found the stride calculation for the cublas batched matmul is
incorrect when we enable the parallel k optimization. And the wrong
strides lead to the mismatching results in Llama inference. It's a bit
complex to convert the matmuls in parallel k optimzation to a
canonicalized batched matmul in cublas, so I just disabled it.
Specifically, after splitting the K dimension, the extent in K dimension
for each partition might be unequal, and we have to check if the K
dimension is out-of-bound or not. But, actually, we didn't now. In
addition, this complicates the conversion to a normal batched matmul in
cublas.
closes #419

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-134.us-east-2.compute.internal>
Co-authored-by: xiaocenxiaocen <xiao.zhang@centml.ai>
vadiklyutiy pushed a commit that referenced this pull request Dec 20, 2024
I found the stride calculation for the cublas batched matmul is
incorrect when we enable the parallel k optimization. And the wrong
strides lead to the mismatching results in Llama inference. It's a bit
complex to convert the matmuls in parallel k optimzation to a
canonicalized batched matmul in cublas, so I just disabled it.
Specifically, after splitting the K dimension, the extent in K dimension
for each partition might be unequal, and we have to check if the K
dimension is out-of-bound or not. But, actually, we didn't now. In
addition, this complicates the conversion to a normal batched matmul in
cublas.
closes #419

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-134.us-east-2.compute.internal>
Co-authored-by: xiaocenxiaocen <xiao.zhang@centml.ai>
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
I found the stride calculation for the cublas batched matmul is
incorrect when we enable the parallel k optimization. And the wrong
strides lead to the mismatching results in Llama inference. It's a bit
complex to convert the matmuls in parallel k optimzation to a
canonicalized batched matmul in cublas, so I just disabled it.
Specifically, after splitting the K dimension, the extent in K dimension
for each partition might be unequal, and we have to check if the K
dimension is out-of-bound or not. But, actually, we didn't now. In
addition, this complicates the conversion to a normal batched matmul in
cublas.
closes #419

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-134.us-east-2.compute.internal>
Co-authored-by: xiaocenxiaocen <xiao.zhang@centml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant