Inconsistent weight assignment operations in DQNPolicyGraph

### System information
- **OS Platform and Distribution (e.g., macOS 10.14.3)**:
- **Ray installed from (source or binary)**: source
- **Ray version**: 0.7.0.dev2 ab55a1f9
- **Python version**: 3.6.8



### Describe the problem


`DQNPolicyGraph` creates tensorflow assign operations by looping through lists of `self.q_func_vars` and `self.target_q_func_vars` sorted by variable name on lines 390:393. The default sorting is not consistent between the two lists of variable names and as a result the operations can mix up the assignments. The attached code snippet produces the error message below.

```
2019-03-28 18:34:31,402 WARNING worker.py:1397 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.                                                                                                                                                                                                           
2019-03-28 18:34:31.415440: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
WARNING:tensorflow:From /Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/util/decorator_utils.py:127: GraphKeys.VARIABLES (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.                                                                                                                                                                          
Instructions for updating:
Use `tf.GraphKeys.GLOBAL_VARIABLES` instead.
Traceback (most recent call last):
  File "dqn_fail_demonstrator.py", line 37, in <module>
    trainer = DQNAgent(env="CartPole-v0", config=config)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 280, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 377, in _setup
    self._init()
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/dqn/dqn.py", line 207, in _init
    self.env_creator, self._policy_graph)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 510, in make_local_evaluator
    extra_config or {}))
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 727, in _make_evaluator
    async_remote_worker_envs=config["async_remote_worker_envs"])
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 296, in __init__
    self._build_policy_map(policy_dict, policy_config)  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 692, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/dqn/dqn_policy_graph.py", line 394, in __init__
    update_target_expr.append(var_target.assign(var))
  File "/Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 951, in assign
    self._shape.assert_is_compatible_with(value_tensor.shape)
  File "/Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (3,) and (11,) are incompatible
```

### Source code / logs


```
from tensorflow.keras.layers import Dense

import ray
from ray.rllib.models import ModelCatalog, Model
from ray.rllib.agents.dqn import DQNAgent


class DemoNN(Model):
    def _build_layers_v2(self, input_dict, num_outputs, options):
        x = input_dict["obs"]
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(3)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(11)(x)
        x = Dense(2)(x)

        return x, x


ray.init(local_mode=True)
ModelCatalog.register_custom_model("demo_nn", DemoNN)
config = {
    "model": {
        "custom_model": "demo_nn",
    },
    "hiddens": [],
    "num_workers": 0,
}
trainer = DQNAgent(env="CartPole-v0", config=config)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent weight assignment operations in DQNPolicyGraph #4502

System information

Describe the problem

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent weight assignment operations in DQNPolicyGraph #4502

Description

System information

Describe the problem

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions