Added philox based RNG context for HPU device in Dtensor scenarios#156581
Added philox based RNG context for HPU device in Dtensor scenarios#156581pralay-das wants to merge 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156581
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 6fcbadf with merge base d1b4e0f ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
|
hi @zhangxiaoli73, @wconstab could you review this PR? |
wanchaol
left a comment
There was a problem hiding this comment.
Please see the inlined comments. I think:
- It would be better for different device to align on the usage of random seeds.
- we should be very careful about any modification for
_dispatch.pyas it would increment runtime CPU overhead for each operator. If you really want to do this, it should be done in the random module instead
wanchaol
left a comment
There was a problem hiding this comment.
sgtm, I wonder if there's a way for you to add some tests to test hpu..
@wanchaol, in addition to internal validation, we are currently collaborating with "accelerator-integration-wg" to enable out-of-tree accelerators and that could aid in validating these sorts of scenarios. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
In this PR, we are enabling
HPUdevice-specific function calls for random operations. These calls will manage the setting and unsetting of thecontext of Random Number Generator.While HPU devices typically utilize a
Mersenne-based RNG, Dtensor-specific random operations employ anoffset-based (Philox) RNG trackerwhich is specifically integrated withCUDAin scope.To integrate a similar offset-based RNG tracker within the
HPU backend, a backend-specific device handle function is necessary to identify the execution context of these random operations.cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k