Skip to content

[rllib] TorchDiagGaussian doesn’t handle multiple actions correctly. #7397

@soundway

Description

@soundway

This is not a contribution.

Ray version: 0.8.2
Python version: 3.6.8
Pytorch version: 1.4
OS: Ubuntu 18.04 Docker

TorchDiagGaussian doesn’t handle multiple actions correctly. As a result, training PPO with Pytorch will crash when the action space has more than 1 action. Here’s minimal reproduction script:

import gym
from gym.spaces import Box
from ray import tune

class ContinuousEnv(gym.Env):
   def __init__(self, config):
       self.action_space = Box(0.0, 1.0, shape=(2,))
       self.observation_space = Box(0.0, 1.0, shape=(1, ))

   def reset(self):
       return [0.0]

   def step(self, action):
       return [0.0], 1.0, False, {}

tune.run(
   "PPO",
   config={"env": ContinuousEnv, "use_pytorch": True, "num_workers": 1})

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions