[RLlib] TorchPolicies: Accessing "infos" dict in train_batch causes `TypeError`.

cwerner described this bug on the discuss.ray.io channel:

------


Hello,

I recently updated from rllib 0.8.6 to rllib 1.0.1 and also converted from TF to Torch. Previously, I was using the info dict from my custom environment to pass along values that I used in the postprocessing callback. However, now I don’t see the info key when I check the keys on the original batch (using original_batches[agent_id]) in my on_postprocess_trajectory callback. I only see :

(pid=17353) dict_keys([‘obs’, ‘actions’, ‘rewards’, ‘dones’, ‘eps_id’, ‘agent_index’, ‘new_obs’, ‘action_dist_inputs’, ‘action_logp’, ‘vf_preds’, ‘unroll_id’, ‘advantages’, ‘value_targets’])

I tried changing the view_requirements function to include SampleBatch.INFOS but this did not work. How can I go about seeing the info dicts in this callback?

To reproduce:

- Run the rllib/agents/ppo/tests/test_ppo.py::test_ppo_compilation_and_lr_schedule test case with ray.init(local_mode=True)
- For the initial test call to the loss, you should see “infos” in train_batch properly initialized with 0s.
- Make sure you access that dict (e.g. add a simple `print(train_batch["infos"])` to your (or PPO's) loss function).
- Then, in subsequent loss calls, the "infos" column will actually be populated with Env info dicts. If you keep accessing this column, you'll see another error in torch that has to do with the attempted conversion of the TrackingDict (from {} to a torch tensor, which fails):
`TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.`

*Ray version and other system information (Python version, TensorFlow version, OS):*

### Reproduction (REQUIRED)
Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have **no external library dependencies** (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] TorchPolicies: Accessing "infos" dict in train_batch causes `TypeError`. #13038

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RLlib] TorchPolicies: Accessing "infos" dict in train_batch causes TypeError. #13038

Description

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RLlib] TorchPolicies: Accessing "infos" dict in train_batch causes `TypeError`. #13038