Skip to content

[cuda] Limit grid size for torch.cat kernel on aligned16 contig tensors#103233

Closed
valentinandrei wants to merge 1 commit intopytorch:mainfrom
valentinandrei:main
Closed

[cuda] Limit grid size for torch.cat kernel on aligned16 contig tensors#103233
valentinandrei wants to merge 1 commit intopytorch:mainfrom
valentinandrei:main

Conversation

@valentinandrei
Copy link
Copy Markdown
Contributor

@valentinandrei valentinandrei commented Jun 8, 2023

When torch.cat gets called on a list of contiguous tensors that are aligned on a 16B boundary in memory, the number of thread blocks used is directly proportional with the maximum size of the tensors in the list. If one or more tensors are very large while the others are small, a high number of thread blocks results in useless redundant loads of the input metadata. This PR limits the grid size and improves the performance of cat when used on list of tensors with large variations in size.

Used the same test program from #102815 but added new cases with list of tensors with varying sizes.

Screenshot 2023-06-07 at 10 14 18 PM

@pytorch-bot pytorch-bot Bot added the release notes: cuda release notes category label Jun 8, 2023
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jun 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103233

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit dcac7e1:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@valentinandrei
Copy link
Copy Markdown
Contributor Author

cc: @malfet

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Jun 8, 2023

@pytorchbot merge -ic

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jun 8, 2023

-ic flag is deprecated, please use -i instead for the same effect.

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Jun 8, 2023

@pytorchbot merge -i

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 8, 2023
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-focal-py3.8-gcc7 / test (distributed, 1, 2, linux.2xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants