Skip to content

Inaccurate samples from on_sample_end() #4809

@rl-2

Description

@rl-2

System information

  • OS Platform and Distribution: macOS Mojave 10.14.3
  • Ray installed from (source or binary): source
  • Ray version: 0.6.2
  • Python version: 3.6.6

Describe the problem

Algorithm: APEX_DDPG
Environment: Pendulum-v0

I'm trying to use on_sample_end() to retrieve all the transition data: [obs, action, reward, obs_next, done]. For each episode, the dones should be all False except the last transition. Namely, each episode should only has one True. However, I've noticed the number of Trues actually relates to n_step. Specifically, when batch_mode is "complete_episodes", the number of Trues in the end of each episode equals the value of n_step. When batch_mode is "truncate_episodes", the number of Trues randomly jumps between 0 and the value of n_step.

Source code / logs

The code to see the number of Trues:

def on_sample_end(info):

    samples = info["samples"]
    dones = samples.columns(["dones"])
    count_true = 0   
    for i in dones[0]:
        if i == True:
            count_true += 1
    print(count_true)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionJust a question :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions