the training results can be pulled to the main process

this is weights from `ddp-spawn`


```python
ic| spawn_output: _SpawnOutput(best_model_path='./lightning_logs/version_53/checkpoints/epoch=0-step=10.ckpt', weights_path='./.temp.ckpt', trainer_state=TrainerState(status=<TrainerStatus.FINISHED: 'finished'>, fn=<TrainerFn.FITTING: 'fit'>, stage=None, _fault_tolerant_mode=<_FaultTolerantMode.DISABLED: 'disabled'>), trainer_results=None, extra=[{'val_loss': array(1., dtype=float32)}])
```

this is the weights from `ray_ddp`


```python
None
```


This is because in the `ddp-spawn`
https://github.com/Lightning-AI/lightning/blob/master/src/pytorch_lightning/strategies/launchers/spawn.py#L104-L105

```python
        results = function(*args, **kwargs)

        if trainer is not None:
            results = self._collect_rank_zero_results(trainer, results)

```

the output is 

```python
ic| results: None, 'raw'
ic| results: _SpawnOutput(best_model_path='./lightning_logs/version_57/checkpoints/epoch=0-step=10.ckpt', weights_path='./.temp.ckpt', trainer_state=TrainerState(status=<TrainerStatus.FINISHED: 'finished'>, fn=<TrainerFn.FITTING: 'fit'>, stage=None, _fault_tolerant_mode=<_FaultTolerantMode.DISABLED: 'disabled'>), trainer_results=None, extra=[{'val_loss': array(1., dtype=float32)}])
    '2nd handed': '2nd handed'
ic| spawn_output: _SpawnOutput(best_model_path='./lightning_logs/version_57/checkpoints/epoch=0-step=10.ckpt', weights_path='./.temp.ckpt', trainer_state=TrainerState(status=<TrainerStatus.FINISHED: 'finished'>, fn=<TrainerFn.FITTING: 'fit'>, stage=None, _fault_tolerant_mode=<_FaultTolerantMode.DISABLED: 'disabled'>), trainer_results=None, extra=[{'val_loss': array(1., dtype=float32)}])
```


on the other hand, for the `ray ddp`, these output is 

https://github.com/JiahaoYao/ray_lightning/blob/2727fd441a62e0e6763fd1f25ed97575dc5a6733/ray_lightning/ray_ddp.py#L252-L255

```python 
(RayExecutor pid=7048)     socket.gethostbyname(socket.gethostname()): '10.0.2.160'
(RayExecutor pid=7048) ic| results: None, '1st import'
(RayExecutor pid=7048) _SpawnOutput(best_model_path='', weights_path=None, trainer_state=TrainerState(status=<TrainerStatus.INITIALIZING: 'initializing'>, fn=None, stage=None, _fault_tolerant_mode=<_FaultTolerantMode.DISABLED: 'disabled'>), trainer_results=None, extra=[{}])
(RayExecutor pid=7048)     '2nd import': '2nd import'
```

this is still because the trainer is only the copy here. 


#143 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the training results can be pulled to the main process #162

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the training results can be pulled to the main process #162

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions