Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW #71

hibagus · 2022-08-07T02:54:28Z

When enabling CPU Offloading for the Optimizer as follows:

offload_optimizer": {
      "offload_optimizer": {
            "device": "cpu",            # "[cpu|nvme]"
            "nvme_path": "/local_nvme",
            "pin_memory": false,         # [true|false]
            "buffer_count": 4,
            "fast_init": false
        },

the following error occurs:

Traceback (most recent call last):
  File "/home/bagus/DeepSpeed/Debug/Megatron-DeepSpeed/pretrain_gpt.py", line 276, in <module>
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
  File "/home/bagus/DeepSpeed/Debug/Megatron-DeepSpeed/megatron/training.py", line 130, in pretrain
    model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider, teacher=False)
  File "/home/bagus/DeepSpeed/Debug/Megatron-DeepSpeed/megatron/training.py", line 420, in setup_model_and_optimizer
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/__init__.py", line 121, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 310, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1096, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1348, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 510, in __init__
    self.initialize_optimizer_states()
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 600, in initialize_optimizer_states
    self.optimizer.step()
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/apex/optimizers/fused_adam.py", line 180, in step
    multi_tensor_applier(self.multi_tensor_adam,
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/apex/multi_tensor_apply/multi_tensor_apply.py", line 27, in __call__
    return op(self.chunk_size,
RuntimeError: expected input to be on cuda

It seems like args.cpu_optimizer is always set to False, although the CPU optimizer offloading is enabled. Therefore, the FusedAdam optimizer from apex.optimizer is always used. This optimizer expects input on GPU memory, not on CPU memory.

As a temporary fix, I would suggest to fallback to torch.optim.AdamW which can be used both on CPU and GPU. At least, until we can find out why args.cpu_optimizer is always set to False.

Let me know what you think :)

My environment:
OS: Ubuntu 20.04
GPU: 4xA100 40GB SXM4
ds_report:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/torch']
torch version .................... 1.12.0+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.6.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3

…emporary solution for CPU optimizer offloading

ghost · 2022-08-07T02:54:40Z

All CLA requirements met.

awan-10 · 2022-08-12T22:27:08Z

@hibagus - this has been fixed in the latest deepspeed. Can you kindly test and close this PR?

hibagus · 2022-08-13T00:59:52Z

@awan-10 Yes that fix solve the problem. I will close this PR. Thanks Awan!

use torch.optim.AdamW instead of FusedAdam from apex.optimizer as a t…

8fd352d

…emporary solution for CPU optimizer offloading

hibagus requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, awan-10, cli99, conglongli, duli2012, eltonzheng, jeffra, minjiaz, mrwyattii, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners August 7, 2022 02:54

hibagus changed the title ~~Offloading optimizer to CPU causes expected input to be on cuda; Suggest to fallback to torch.optim.AdamW~~ Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW Aug 7, 2022

hibagus mentioned this pull request Aug 8, 2022

[Bug] RuntimeError: expected input to be on cuda deepspeedai/DeepSpeed#2186

Closed

tjruwase mentioned this pull request Aug 10, 2022

Correctly detect offload configuration deepspeedai/DeepSpeed#2208

Merged

hibagus closed this Aug 13, 2022

saforem2 added a commit to saforem2/Megatron-DeepSpeed that referenced this pull request Dec 25, 2024

Merge pull request deepspeedai#71 from argonne-lcf/aurora-post-at

587aafd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW #71

Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW #71

Uh oh!

hibagus commented Aug 7, 2022

Uh oh!

ghost commented Aug 7, 2022 •

edited by ghost

Loading

Uh oh!

awan-10 commented Aug 12, 2022

Uh oh!

hibagus commented Aug 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW #71

Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW #71

Uh oh!

Conversation

hibagus commented Aug 7, 2022

Uh oh!

ghost commented Aug 7, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awan-10 commented Aug 12, 2022

Uh oh!

hibagus commented Aug 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ghost commented Aug 7, 2022 •

edited by ghost

Loading