[ShardedOptimizer] Use views in the buckets / save memory

## 🚀 Feature
Same as the new DDP feature, skip the copies (at least optionally) and use "virtual" buckets (slices/views depending on languages) instead

## Motivation
For big models and jobs with many ranks, the buckets multiply and end up taking some space, which goes contrary to the whole point of ShardedOptimizer

## Pitch
Have a look at the view/slice approach, may not map directly but worth investigating

## Alternatives
Status quo, works

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ShardedOptimizer] Use views in the buckets / save memory #187

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ShardedOptimizer] Use views in the buckets / save memory #187

Description

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions