Expose an API to query the CUDA compute stream to launch a custom kernel#9141
Merged
hariharans29 merged 10 commits intomasterfrom Nov 9, 2021
Merged
Expose an API to query the CUDA compute stream to launch a custom kernel#9141hariharans29 merged 10 commits intomasterfrom
hariharans29 merged 10 commits intomasterfrom
Conversation
hariharans29
commented
Sep 21, 2021
|
amazing feature, I am just looking for it! |
Contributor
|
Working as expected on my end. Thanks Hari! |
wangyems
previously approved these changes
Oct 8, 2021
pranavsharma
reviewed
Nov 3, 2021
pranavsharma
approved these changes
Nov 9, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
Description as title.
Particularly useful for the scenario where-in custom ops compiled into shared libraries need to achieve implicit synchronization with ORT's CUDA kernels
Currently, we only have one compute stream per-session. So, this could be a session level API. But it is kept as an API at the
OrtKernelContextlevel to keep the design flexible enough for the case where-in (in future) sessions could have multiple streams (one per host thread). When the sessions starts maintaining one stream per host thread, the API will start returning the stream corresponding to that thread.Motivation and Context
#7068 (comment)