Inaccurate samples from on_sample_end()

### System information
- **OS Platform and Distribution**: macOS Mojave 10.14.3
- **Ray installed from (source or binary)**: source
- **Ray version**: 0.6.2
- **Python version**: 3.6.6

### Describe the problem

Algorithm: APEX_DDPG 
Environment: Pendulum-v0 

I'm trying to use on_sample_end() to retrieve all the transition data: `[obs, action, reward, obs_next, done]`. For each episode, the dones should be all False except the last transition. Namely, each episode should only has one True. However, I've noticed the number of Trues actually relates to n_step. Specifically, when batch_mode is "complete_episodes", the number of Trues in the end of each episode equals the value of n_step. When batch_mode is "truncate_episodes", the number of Trues randomly jumps between 0 and the value of n_step. 

### Source code / logs
The code to see the number of Trues:

    def on_sample_end(info):

        samples = info["samples"]
        dones = samples.columns(["dones"])
        count_true = 0   
        for i in dones[0]:
            if i == True:
                count_true += 1
        print(count_true)
   




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccurate samples from on_sample_end() #4809

System information

Describe the problem

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inaccurate samples from on_sample_end() #4809

Description

System information

Describe the problem

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions