OSS: Make the checkpoints partition-agnostic

## 🚀 Feature
Change the consolidated state dict so that it becomes partition-independent

## Motivation
This would make it possible to change the number of hosts when restarting a job

## Pitch
state_dict() and load_state_dict() need to flatten/shard everything out, instead of storing data per rank

## Alternatives
Current status, same number of ranks before and after

## Additional context
Capturing elements of a discussion with the DeepSpeed MSFT team


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSS: Make the checkpoints partition-agnostic #164

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OSS: Make the checkpoints partition-agnostic #164

Description

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions