[DDP] Custom buffer reduction#64513
[DDP] Custom buffer reduction#64513rohan-varma wants to merge 8 commits intogh/rohan-varma/404/basefrom
Conversation
Proposal: #63041 Support custom buffer reduction in DDP via hook Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit fe5d59b (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
| state, | ||
| hook: callable, |
There was a problem hiding this comment.
nit: make state after hook? following up (func, args) convention?
There was a problem hiding this comment.
Thanks for pointing this out!
I followed the same order as comm hook (state, hook): https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py#L1224.
Should we try to keep it the same or change the order here?
|
|
||
| @dataclass | ||
| class _BufferCommHook: | ||
| buffer_comm_hook: Callable |
There was a problem hiding this comment.
nit: document how the signature of a buffer_comm_hook looks like:
def buffer_comm_hook(state, self.name_buffers)
There was a problem hiding this comment.
I added the documentation to the function in the PR at the top of this stack: https://github.com/pytorch/pytorch/pull/64515/files, will just leave it there if you're okay (plan is to land the whole stack together).
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
|
|
||
| # sync params according to location (before/after forward) user | ||
| # specified as part of hook, if hook was specified. | ||
| if ( |
There was a problem hiding this comment.
moved to helper fn in the next diff
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64513 Proposal: #63041 Support custom buffer reduction in DDP via hook ghstack-source-id: 138419449 Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
|
This pull request has been merged in ce5981e. |
Stack from ghstack:
Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios.
Differential Revision: D30751152
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23