Skip to content

Conversation

@hibagus
Copy link

@hibagus hibagus commented Aug 7, 2022

When enabling CPU Offloading for the Optimizer as follows:

offload_optimizer": {
      "offload_optimizer": {
            "device": "cpu",            # "[cpu|nvme]"
            "nvme_path": "/local_nvme",
            "pin_memory": false,         # [true|false]
            "buffer_count": 4,
            "fast_init": false
        },

the following error occurs:

Traceback (most recent call last):
  File "/home/bagus/DeepSpeed/Debug/Megatron-DeepSpeed/pretrain_gpt.py", line 276, in <module>
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
  File "/home/bagus/DeepSpeed/Debug/Megatron-DeepSpeed/megatron/training.py", line 130, in pretrain
    model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider, teacher=False)
  File "/home/bagus/DeepSpeed/Debug/Megatron-DeepSpeed/megatron/training.py", line 420, in setup_model_and_optimizer
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/__init__.py", line 121, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 310, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1096, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1348, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 510, in __init__
    self.initialize_optimizer_states()
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 600, in initialize_optimizer_states
    self.optimizer.step()
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/apex/optimizers/fused_adam.py", line 180, in step
    multi_tensor_applier(self.multi_tensor_adam,
  File "/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/apex/multi_tensor_apply/multi_tensor_apply.py", line 27, in __call__
    return op(self.chunk_size,
RuntimeError: expected input to be on cuda

It seems like args.cpu_optimizer is always set to False, although the CPU optimizer offloading is enabled. Therefore, the FusedAdam optimizer from apex.optimizer is always used. This optimizer expects input on GPU memory, not on CPU memory.

As a temporary fix, I would suggest to fallback to torch.optim.AdamW which can be used both on CPU and GPU. At least, until we can find out why args.cpu_optimizer is always set to False.

Let me know what you think :)

My environment:
OS: Ubuntu 20.04
GPU: 4xA100 40GB SXM4
ds_report:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/torch']
torch version .................... 1.12.0+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/home/bagus/anaconda3/envs/MegatronDeepSpeed/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.6.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3

…emporary solution for CPU optimizer offloading
@ghost
Copy link

ghost commented Aug 7, 2022

CLA assistant check
All CLA requirements met.

@hibagus hibagus changed the title Offloading optimizer to CPU causes expected input to be on cuda; Suggest to fallback to torch.optim.AdamW Offloading optimizer to CPU causes "expected input to be on cuda" error; Suggest to fallback to torch.optim.AdamW Aug 7, 2022
@awan-10
Copy link

awan-10 commented Aug 12, 2022

@hibagus - this has been fixed in the latest deepspeed. Can you kindly test and close this PR?

@hibagus
Copy link
Author

hibagus commented Aug 13, 2022

@awan-10 Yes that fix solve the problem. I will close this PR. Thanks Awan!

@hibagus hibagus closed this Aug 13, 2022
saforem2 added a commit to saforem2/Megatron-DeepSpeed that referenced this pull request Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants