Skip to content

[Operator] Make Convolution gemms fusible by resolving to batch_matmul#279

Merged
yaoyaoding merged 2 commits intohidet-org:mainfrom
hjjq:require_prologue
Jun 16, 2023
Merged

[Operator] Make Convolution gemms fusible by resolving to batch_matmul#279
yaoyaoding merged 2 commits intohidet-org:mainfrom
hjjq:require_prologue

Conversation

@hjjq
Copy link
Copy Markdown
Collaborator

@hjjq hjjq commented Jun 15, 2023

No description provided.

@yaoyaoding
Copy link
Copy Markdown
Member

Thanks @hjjq !

@yaoyaoding yaoyaoding merged commit d6e431e into hidet-org:main Jun 16, 2023
@hjjq hjjq deleted the require_prologue branch June 24, 2023 02:51
vadiklyutiy added a commit that referenced this pull request Jul 22, 2024
Right now we have sufficient fixed overhead for model run (#279). 

**no cudagraph**
Below: run of an empty model, no cudagraph times in ms. 

Before. 
Inductor overhead is 0.052 = 0.032 + 0.02 where 0.02 is overhead before
entering in compiler and 0.032 directly inductor overhead.
Hidet overhead is 0.205 = 0.185 + 0.02

After.
Hidet overhead is 0.068 = 0.048 + 0.02

Overhead reduced from 0.185ms -> 0.048ms or by 3.85x


**cudagraph**
Before 
0.162ms

After
0.124ms

Inductor 
0.88ms

For cudagraph there is one more room for improvement(left TODO in code).
vadiklyutiy added a commit that referenced this pull request Jul 23, 2024
Right now we have sufficient fixed overhead for model run (#279). 

**no cudagraph**
Below: run of an empty model, no cudagraph times in ms. 

Before. 
Inductor overhead is 0.052 = 0.032 + 0.02 where 0.02 is overhead before
entering in compiler and 0.032 directly inductor overhead.
Hidet overhead is 0.205 = 0.185 + 0.02

After.
Hidet overhead is 0.068 = 0.048 + 0.02

Overhead reduced from 0.185ms -> 0.048ms or by 3.85x


**cudagraph**
Before 
0.162ms

After
0.124ms

Inductor 
0.88ms

For cudagraph there is one more room for improvement(left TODO in code).
vadiklyutiy added a commit that referenced this pull request Dec 26, 2024
Right now we have sufficient fixed overhead for model run (#279). 

**no cudagraph**
Below: run of an empty model, no cudagraph times in ms. 

Before. 
Inductor overhead is 0.052 = 0.032 + 0.02 where 0.02 is overhead before
entering in compiler and 0.032 directly inductor overhead.
Hidet overhead is 0.205 = 0.185 + 0.02

After.
Hidet overhead is 0.068 = 0.048 + 0.02

Overhead reduced from 0.185ms -> 0.048ms or by 3.85x


**cudagraph**
Before 
0.162ms

After
0.124ms

Inductor 
0.88ms

For cudagraph there is one more room for improvement(left TODO in code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants