-
Notifications
You must be signed in to change notification settings - Fork 33
Using LightningCLI to parse plugin options from the config file fails when using the RayPlugin. #151
Description
Using LightningCLI to parse plugin options from the config file fails when you try to use the RayPlugin.
Here's how I am specifying the plugin option in the config file:
trainer:
plugins:
- class_path: ray_lightning.RayPlugin
init_args:
num_workers: 2
use_gpu: false
The error stack I get is below:
Value "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]" does not validate against any of the types in typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str, typing.List[typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str]], NoneType]:
- Type <class 'pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin'> expects an str or a Dict/Namespace with a class_path entry but got "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]"
- Type <class 'pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin'> expects an str or a Dict/Namespace with a class_path entry but got "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]"
- Type <class 'pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment'> expects an str or a Dict/Namespace with a class_path entry but got "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]"
- Type <class 'pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO'> expects an str or a Dict/Namespace with a class_path entry but got "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]"
- Expected a <class 'str'> but got "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]"
- Value "Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))" does not validate against any of the types in typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str]:
- __init__() got multiple values for keyword argument 'parallel_devices'
- "ray_lightning.RayPlugin" is not a subclass of PrecisionPlugin
- "ray_lightning.RayPlugin" is not a subclass of ClusterEnvironment
- "ray_lightning.RayPlugin" is not a subclass of CheckpointIO
- Expected a <class 'str'> but got "Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))"
- Expected a <class 'NoneType'> but got "[Namespace(class_path='ray_lightning.RayPlugin', init_args=Namespace(checkpoint_io=None, cluster_environment=None, ddp_comm_state=None, init_hook=None, num_cpus_per_worker=1, num_nodes=None, num_workers=2, parallel_devices=None, sync_batchnorm=None, use_gpu=False))]"
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/site-packages/jsonargparse/typehints.py", line 462, in adapt_typehints
raise ValueError(f'Value "{val}" does not validate against any of the types in {typehint}:{e}')
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/site-packages/jsonargparse/typehints.py", line 344, in instantiate_classes
value[num] = adapt_typehints(val, self._typehint, instantiate_classes=True, sub_add_kwargs=sub_add_kwargs)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/site-packages/jsonargparse/core.py", line 1054, in instantiate_classes
parent[key] = component.instantiate_classes(value)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/site-packages/jsonargparse/deprecated.py", line 127, in patched_instantiate_classes
cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/site-packages/pytorch_lightning/utilities/cli.py", line 820, in instantiate_classes
self.config_init = self.parser.instantiate_classes(self.config)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/site-packages/pytorch_lightning/utilities/cli.py", line 625, in __init__
self.instantiate_classes()
File "/Users/sbylaiah/Development/cibo/rs-inference/python/deepcdl/deepcdl/scripts/ray/train_deepcdl.py", line 70, in <module>
cli = LightningCLI(UNetModule, CDLDataModule, run=False)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/sbylaiah/miniconda3/envs/cviz/lib/python3.8/runpy.py", line 193, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
The relevant error from the stack trace is this: __init__() got multiple values for keyword argument 'parallel_devices'
This seems to be because the jsonargparse includes the the parallel_devices=None and cluster_environment=None in the init_args Namespace. But when super().__init__ is called from the RayPlugin, the kwargs are passed in for parallel_devices and cluster_environment again, resulting in the multiple values error above. Looks like we don't really need to pass those in into the super.init call, as these arguments are defaulted to None anyways.
Version info:
ray-lightning: 0.2.0
pytorch-lightning: 1.5.10
jsonargparse: 4.2.0