[RLlib] PPO not learning in complex cont. action environments.

PPO is currently not learning e.g. HalfCheetah-v2 using the `tuned_examples/ppo/halfcheetah-ppo.yaml` config (on neither tf nor torch).

ray-0.9.0dev
tf2.2.0
torch1.5.0

The problem does not occur on ray/rllib 0.8.5.

*Ray version and other system information (Python version, TensorFlow version, OS):*

### Reproduction (REQUIRED)
Please provide a script that can be run to reproduce the issue. The script should have **no external library dependencies** (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

- [ ] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/latest/installation.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] PPO not learning in complex cont. action environments. #8889

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RLlib] PPO not learning in complex cont. action environments. #8889

Description

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions