[rllib] bug in rllib.bc.policy.py

### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:Linux Ubuntu 16.04
- **Ray installed from (source or binary)**:pip
- **Ray version**:0.4.0
- **Python version**:2.7.12
- **Exact command to reproduce**:



### Describe the problem


When I use my custom env with BC algorithm, I found an error about action space. This is my action_space. 
`self.action_space = Box(np.array([0.0, -1.0]), np.array([1.0, 1.0]), dtype=np.float32)`
And run the python API and get an error.

> Traceback (most recent call last):
  File "/home/ran/PycharmProjects/untitled/bctest_P.py", line 302, in <module>
    agent = BCAgent(config, 'gazebocar')
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/agent.py", line 93, in __init__
    Trainable.__init__(self, config, registry, logger_creator)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/tune/trainable.py", line 90, in __init__
    self._setup()
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/agent.py", line 116, in _setup
    self._init()
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/bc/bc.py", line 66, in _init
    self.registry, self.env_creator, self.config, self.logdir)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/bc/bc_evaluator.py", line 22, in __init__
    env.action_space, config)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/bc/policy.py", line 25, in __init__
    self.setup_loss(action_space)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/bc/policy.py", line 43, in setup_loss
    log_prob = self.curr_dist.logp(self.ac)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/ray/rllib/models/action_dist.py", line 86, in logp
    0.5 * np.log(2.0 * np.pi) * tf.to_float(tf.shape(x)[1]) -
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 962, in binary_op_wrapper
    y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
    as_ref=False)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/ran/.virtualenvs/gym_gazebo/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 883, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: 'Tensor("local/split:0", shape=(?, 2), dtype=float32, device=/job:localhost/replica:0/task:0/device:CPU:0)'



### Source code / logs


`    agent = BCAgent(config, 'gazebocar')
    for i in range(10):
        result = agent.train()`

I think this code should be changed from:
`def setup_loss(self, action_space):
        self.ac = tf.placeholder(tf.int64, [None], name="ac")`
to:
`def setup_loss(self, action_space):
self.ac = tf.placeholder(tf.float32, [None]+ list(action_space.shape), name="ac")`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] bug in rllib.bc.policy.py #1972

System information

Describe the problem

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[rllib] bug in rllib.bc.policy.py #1972

Description

System information

Describe the problem

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions