Skip to content

[Torch] Steal Pytorch weights#310

Merged
hjjq merged 3 commits intohidet-org:mainfrom
hjjq:steal
Jul 13, 2023
Merged

[Torch] Steal Pytorch weights#310
hjjq merged 3 commits intohidet-org:mainfrom
hjjq:steal

Conversation

@hjjq
Copy link
Copy Markdown
Collaborator

@hjjq hjjq commented Jul 12, 2023

No description provided.

yaoyaoding and others added 3 commits July 12, 2023 16:33
@hjjq hjjq merged commit 692192c into hidet-org:main Jul 13, 2023
@hjjq hjjq deleted the steal branch July 13, 2023 00:28
vadiklyutiy added a commit that referenced this pull request Jul 22, 2024
Right now we have sufficient fixed overhead for model run (#279). 

**no cudagraph**
Below: run of an empty model, no cudagraph times in ms. 

Before. 
Inductor overhead is 0.052 = 0.032 + 0.02 where 0.02 is overhead before
entering in compiler and 0.032 directly inductor overhead.
Hidet overhead is 0.205 = 0.185 + 0.02

After.
Hidet overhead is 0.068 = 0.048 + 0.02

Overhead reduced from 0.185ms -> 0.048ms or by 3.85x


**cudagraph**
Before 
0.162ms

After
0.124ms

Inductor 
0.88ms

For cudagraph there is one more room for improvement(left TODO in code).
vadiklyutiy added a commit that referenced this pull request Jul 23, 2024
Right now we have sufficient fixed overhead for model run (#279). 

**no cudagraph**
Below: run of an empty model, no cudagraph times in ms. 

Before. 
Inductor overhead is 0.052 = 0.032 + 0.02 where 0.02 is overhead before
entering in compiler and 0.032 directly inductor overhead.
Hidet overhead is 0.205 = 0.185 + 0.02

After.
Hidet overhead is 0.068 = 0.048 + 0.02

Overhead reduced from 0.185ms -> 0.048ms or by 3.85x


**cudagraph**
Before 
0.162ms

After
0.124ms

Inductor 
0.88ms

For cudagraph there is one more room for improvement(left TODO in code).
vadiklyutiy added a commit that referenced this pull request Dec 26, 2024
Right now we have sufficient fixed overhead for model run (#279). 

**no cudagraph**
Below: run of an empty model, no cudagraph times in ms. 

Before. 
Inductor overhead is 0.052 = 0.032 + 0.02 where 0.02 is overhead before
entering in compiler and 0.032 directly inductor overhead.
Hidet overhead is 0.205 = 0.185 + 0.02

After.
Hidet overhead is 0.068 = 0.048 + 0.02

Overhead reduced from 0.185ms -> 0.048ms or by 3.85x


**cudagraph**
Before 
0.162ms

After
0.124ms

Inductor 
0.88ms

For cudagraph there is one more room for improvement(left TODO in code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants