-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
questionJust a question :)Just a question :)
Description
System information
- OS Platform and Distribution: macOS Mojave 10.14.3
- Ray installed from (source or binary): source
- Ray version: 0.6.2
- Python version: 3.6.6
Describe the problem
Algorithm: APEX_DDPG
Environment: Pendulum-v0
I'm trying to use on_sample_end() to retrieve all the transition data: [obs, action, reward, obs_next, done]. For each episode, the dones should be all False except the last transition. Namely, each episode should only has one True. However, I've noticed the number of Trues actually relates to n_step. Specifically, when batch_mode is "complete_episodes", the number of Trues in the end of each episode equals the value of n_step. When batch_mode is "truncate_episodes", the number of Trues randomly jumps between 0 and the value of n_step.
Source code / logs
The code to see the number of Trues:
def on_sample_end(info):
samples = info["samples"]
dones = samples.columns(["dones"])
count_true = 0
for i in dones[0]:
if i == True:
count_true += 1
print(count_true)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionJust a question :)Just a question :)