🚀 Feature
Same as the new DDP feature, skip the copies (at least optionally) and use "virtual" buckets (slices/views depending on languages) instead
Motivation
For big models and jobs with many ranks, the buckets multiply and end up taking some space, which goes contrary to the whole point of ShardedOptimizer
Pitch
Have a look at the view/slice approach, may not map directly but worth investigating
Alternatives
Status quo, works
Additional context