Skip to content

[Operator] optimize normalize op with vectorized load, dynamic shape and more#316

Merged
xinli-git merged 3 commits intohidet-org:mainfrom
xinli-git:norm_optimize
Jul 16, 2023
Merged

[Operator] optimize normalize op with vectorized load, dynamic shape and more#316
xinli-git merged 3 commits intohidet-org:mainfrom
xinli-git:norm_optimize

Conversation

@xinli-git
Copy link
Copy Markdown
Contributor

@xinli-git xinli-git commented Jul 16, 2023

this change introduces several enhancements to the current norm operator

  • vectorized load for fp16 types
  • allow epilogue
  • dynamic shape on the normalized dimension
  • add a tuning to use 2 warp shuffle routines or just a single one
  • cleaner code and implementation

as a result, norm_fp16.py can be safely deleted

for shapes in stable diffusion: [2, 32, 60, 16, 16], norm dims [60, 16, 16], fp32:

  • torch : 0.023 ms
  • main: 0.036 ms
  • this change: 0.027 ms

for shapes in bert-base: [1, 128, 768], norm dims [768], fp16:

  • torch: 0.006 ms
  • main: 0.008 ms
  • this change: 0.007 ms

We are still not faster than torch, but very close

@xinli-git xinli-git merged commit 15426c8 into hidet-org:main Jul 16, 2023
@xinli-git
Copy link
Copy Markdown
Contributor Author

the tests are passing
this change is isolated to norm OP so will merge without a review.

@@ -104,39 +108,69 @@ def allow_prologue(self) -> bool:
return False

def allow_epilogue(self) -> bool:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

问一个比较小白的问题,prologue 和这里的 epilogue,在中文里面一般怎么翻译,二者是什么作用呢,谢谢

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the section 4.2 and section 5.2 of our paper[1] to learn more about about the prologue and epilogue fusion (in the paper, it's called post-scheduling-fusion). It seems that there is no obvious translation in Chinese, maybe "前驱算子" and "后继算子".

[1] https://dl.acm.org/doi/pdf/10.1145/3575693.3575702

@xinli-git xinli-git deleted the norm_optimize branch August 21, 2023 02:48
vadiklyutiy pushed a commit that referenced this pull request Jul 22, 2024
Previously I added hidet.ones_like without handling dtype and device.
Now made it properly work with all these arguments

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Jul 23, 2024
Previously I added hidet.ones_like without handling dtype and device.
Now made it properly work with all these arguments

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
Previously I added hidet.ones_like without handling dtype and device.
Now made it properly work with all these arguments

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants