Skip to content

Error while initializing multiple models #65

@1Konny

Description

@1Konny

Hi.

I'm trying to use deepspeed in my code with multiple models, but got an error like below. Do you have any idea to solve this issue? Thanks in advance.

  File "train_ds.py", line 98, in <module>
    solver = Solver(opt)
  File "/data2/1konny/svg/solver_ds.py", line 40, in __init__
    self.init_models_and_optimizers()
  File "/data2/1konny/svg/solver_ds.py", line 117, in init_models_and_optimizers
    self.decoder, self.decoder_optimizer, _, _ = ds.initialize(opt, model=decoder, model_parameters=decoder_params)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/__init__.py", line 87, in initialize
    collate_fn=collate_fn)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/pt/deepspeed_light.py", line 123, in __init__
    dist.init_process_group(backend="nccl")
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 372, in init_process_group
    raise RuntimeError("trying to initialize the default process group "
RuntimeError: trying to initialize the default process group twice!

ds_config.json

{
  "train_batch_size": 4,
  "gradient_accumulation_steps": 1,
  "steps_per_print": 1,
  "optimizer": {
    "type": "Adam",
    "params": {
      "lr": 0.0001,
      "max_grad_norm": 1.0,
      "betas": [
         0.9,
         0.999
       ]
    }
  }
}

command-line

deepspeed train_ds.py --deepspeed --deepspeed_config deepspeed_util/ds_config.json ...

code

training_data = load_dataset()
encoder_params = filter(lambda p: p.requires_grad, encoder.parameters())
decoder_params = filter(lambda p: p.requires_grad, decoder.parameters())
self.encoder, self.encoder_optim, train_loader, _ = deepspeed.initialize(opt, model=encoder, model_parameters=encoder_params, training_data=training_data)
self.decoder, self.decoder_optim, _, _ = deepspeed.initialize(opt, model=decoder, model_parameters=decoder_params)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions