[rllib] incorrect model output for DQN with torch and dueling=false 

### What is the problem?

The output fo the DQN model is not within the action space. 

Something is wrong when constructing the torch model when dueling is off. The output dimension of the model is equal to whatever is passed in "fcnet_hiddens" instead of being of the size of the action space. 

*Ray version and other system information (Python version, TensorFlow version, OS):*
- ray==0.9.0.dev0 
- python 3.6.10
- mac OS

### Reproduction (REQUIRED)

```python
import ray
from ray import tune

ray.init()

config = {
    "env": "CartPole-v1",
    "num_workers": 1,
    "train_batch_size": 128,
    "learning_starts": 128,
    "model": {"fcnet_hiddens": [32]},
    "dueling": False ,
    "framework": "torch"
}

tune.run("DQN", name="MWE", config=config, stop={"training_iteration": 100})
```

- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/latest/installation.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] incorrect model output for DQN with torch and dueling=false #9366

What is the problem?

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[rllib] incorrect model output for DQN with torch and dueling=false #9366

Description

What is the problem?

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions