Is your feature request related to a problem? Please describe.
The issue is related to #5620 and #6011. When having a deespeed model initialised for ZeRO-3 inference, with a DeepSpeedZeRoOffload optimizer for example, the model cannot be moved to the CPU either by using the torch.nn.module.to() functionality or with the new offload_states API.
Describe the solution you'd like
Either extend #6011 to support offload of a model configured for ZeRO-3 inference or a new API that supports this.
Thanks
Is your feature request related to a problem? Please describe.
The issue is related to #5620 and #6011. When having a deespeed model initialised for ZeRO-3 inference, with a
DeepSpeedZeRoOffloadoptimizer for example, the model cannot be moved to the CPU either by using thetorch.nn.module.to()functionality or with the newoffload_statesAPI.Describe the solution you'd like
Either extend #6011 to support offload of a model configured for ZeRO-3 inference or a new API that supports this.
Thanks