[rllib] TorchDiagGaussian doesn’t handle multiple actions correctly.

This is not a contribution.

Ray version: 0.8.2
Python version: 3.6.8
Pytorch version: 1.4
OS: Ubuntu 18.04 Docker

TorchDiagGaussian doesn’t handle multiple actions correctly. As a result, training PPO with Pytorch will crash when the action space has more than 1 action. Here’s minimal reproduction script:

```python
import gym
from gym.spaces import Box
from ray import tune

class ContinuousEnv(gym.Env):
   def __init__(self, config):
       self.action_space = Box(0.0, 1.0, shape=(2,))
       self.observation_space = Box(0.0, 1.0, shape=(1, ))

   def reset(self):
       return [0.0]

   def step(self, action):
       return [0.0], 1.0, False, {}

tune.run(
   "PPO",
   config={"env": ContinuousEnv, "use_pytorch": True, "num_workers": 1})
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] TorchDiagGaussian doesn’t handle multiple actions correctly. #7397

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[rllib] TorchDiagGaussian doesn’t handle multiple actions correctly. #7397

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions