Skip to content

[Operator] Enhancements to Reduce#366

Merged
hjjq merged 25 commits intohidet-org:mainfrom
hjjq:conv-reg
Dec 20, 2023
Merged

[Operator] Enhancements to Reduce#366
hjjq merged 25 commits intohidet-org:mainfrom
hjjq:conv-reg

Conversation

@hjjq
Copy link
Copy Markdown
Collaborator

@hjjq hjjq commented Oct 17, 2023

In some input shapes, the current reduce schedule will underutilize the GPU.
E.g., reduce [1, 128, 128, 3] , dims=[1, 2] will spawn 1 threadblock with 3 threads that each iterate over 128*128 elements.
This PR made two changes to optimize these cases:

  1. Add resolve_decompose in the resolve logic of Reduce. This will force launch separate kernels for each reduce dimension, increasing concurrency.
  2. In the default reduce schedule template, spawn multiple warps within the reduce dimensions, which then will communicate via shared memory or use atomics to perform the reduce.

Also added a resolve rule for AdaptivePoolChannelLast.

@yaoyaoding
Copy link
Copy Markdown
Member

Hi @hjjq,

Feel free to merge this PR if it is ready.

@hjjq
Copy link
Copy Markdown
Collaborator Author

hjjq commented Nov 15, 2023

I will merge soon after I ensure it passes performance regression. There are probably also some rebase that needs to be done.

@yaoyaoding
Copy link
Copy Markdown
Member

Sounds good!

@hjjq
Copy link
Copy Markdown
Collaborator Author

hjjq commented Dec 20, 2023

$hidet-ci launch

@hjjq hjjq merged commit 2040a7c into hidet-org:main Dec 20, 2023
@hjjq hjjq deleted the conv-reg branch December 20, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants