Fractional GPU training

Hi guys, just stumbled across these days and love your package! 

However, I get an error when trying to train with GPU fractions. The last PR was here https://github.com/ray-project/ray_lightning/pull/121. 
Makes perfect sense that this error is thrown, but I do not know enough about ray nor ray_lightning to understand `local_rank`.

Do you have an idea how to fix this?

```
File "/home/gugl/miniconda3/envs/mwe_ray/lib/python3.8/site-packages/ray_lightning/ray_ddp.py", line 62, in execute
  return fn(*args, **kwargs)
File "/home/gugl/miniconda3/envs/mwe_ray/lib/python3.8/site-packages/ray_lightning/ray_ddp.py", line 449, in execute_remote
  self._worker_setup(process_idx=global_rank)
File "/home/gugl/miniconda3/envs/mwe_ray/lib/python3.8/site-packages/ray_lightning/ray_ddp.py", line 409, in _worker_setup
  self.torch_distributed_backend,
File "/home/gugl/miniconda3/envs/mwe_ray/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/parallel.py", line 103, in torch_distributed_backend
  torch_backend = "nccl" if self.on_gpu else "gloo"
File "/home/gugl/miniconda3/envs/mwe_ray/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/parallel.py", line 51, in on_gpu
  return self.root_device.type == "cuda" and torch.cuda.is_available()
File "/home/gugl/miniconda3/envs/mwe_ray/lib/python3.8/site-packages/ray_lightning/ray_ddp.py", line 526, in root_device
  return torch.device("cuda", device_id)
(train_network pid=725884) TypeError: Device(): argument 'index' (position 2) must be int, not float
```

Maybe relevant pieces of code: 

```python 
def train_network(config):
    ...
    trainer = pl.Trainer(
        ...
        strategy=RayPlugin(num_workers=1, find_unused_parameters=False, resources_per_worker={"CPU": 2, "GPU": 0.5}),
    )
    trainer.fit(model, datamodule=mySimDataModule)

analysis = tune.run(
    train_network,
    ...
    resources_per_trial=tune.PlacementGroupFactory([{"CPU": 1}, {"CPU": 2, "GPU": 0.5}]),
)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fractional GPU training #124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fractional GPU training #124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions