System Info
I was using fsdp with settings "full_shard auto_wrap" on a A100 GPU. The training went well but was interupted when saving the checkpoints. The error stated NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet. I understand that I am using a single GPU so fsdp defaluts to NO_SHAPR. However, I dont understand why offload_to_cpu was set to True. Or anywhere I can reset it to false?
Who can help?
No response
Information
Tasks
Reproduction
following https://github.com/lm-sys/FastChat to fine-tune an LLM
Expected behavior
the error as stated.