🚀 The feature, motivation and pitch
Description
Currently, alloc_id is required to be unique per allocation site within a process lifetime, so that:
- All ranks use consistent memory addresses for a given collective
- Expensive P2P memory registration (e.g.
ncclCommWindowRegister) can be skipped on subsequent uses via cached pointers
In the current implementation, we generate alloc_id via random.
|
f"alloc_id={random.randint(0, 2**64 - 1)})" |
Proposed Direction
#171909 (comment)
Related Context
Alternatives
No response
Additional context
No response
cc @mruberry @kurtamohler @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo
🚀 The feature, motivation and pitch
Description
Currently,
alloc_idis required to be unique per allocation site within a process lifetime, so that:ncclCommWindowRegister) can be skipped on subsequent uses via cached pointersIn the current implementation, we generate
alloc_idvia random.pytorch/torch/_inductor/codegen/wrapper.py
Line 913 in 9bbc5b2
Proposed Direction
#171909 (comment)
Related Context
Alternatives
No response
Additional context
No response
cc @mruberry @kurtamohler @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo