Skip to content

[ShardedOptimizer] Use views in the buckets / save memory #187

@blefaudeux

Description

@blefaudeux

🚀 Feature

Same as the new DDP feature, skip the copies (at least optionally) and use "virtual" buckets (slices/views depending on languages) instead

Motivation

For big models and jobs with many ranks, the buckets multiply and end up taking some space, which goes contrary to the whole point of ShardedOptimizer

Pitch

Have a look at the view/slice approach, may not map directly but worth investigating

Alternatives

Status quo, works

Additional context

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions