Conversation
There was a problem hiding this comment.
make sure observations + new_obs treatment is the same
|
Test PASSed. |
There was a problem hiding this comment.
Is it possible to remove the replay handling from DDPG, and use SyncLocalReplayOptimizer / ApexOptimizer instead? Recently we refactored DQN to have replay be handled by the optimizer classes.
There was a problem hiding this comment.
Note: a copy of these classes in in ray.rllib.optimizers already
python/ray/rllib/agent.py
Outdated
There was a problem hiding this comment.
Nice. It would be good to also
- make an issue to update the docs; there are a couple new algs not documented by now
- add a DDPG example to the regression tests folder (tuned_examples/regression_tests)
- add a DDPG sanity check to multi_node_tests.sh
There was a problem hiding this comment.
@alvkao58 can you take care of these comments by Eric? (see PR)
There was a problem hiding this comment.
any particular sanity checks you want to see added to multi_node_tests.sh?
|
Test FAILed. |
|
Test FAILed. |
There was a problem hiding this comment.
don't you track the actor samples in samples["actions"] or something?
There was a problem hiding this comment.
yeah, but looking at https://arxiv.org/pdf/1509.02971.pdf, aren't the actions according to what the actor currently outputs?
python/ray/rllib/ddpg/ddpg.py
Outdated
There was a problem hiding this comment.
I think update_target is actually supposed to happen very often
python/ray/rllib/ddpg/models.py
Outdated
There was a problem hiding this comment.
Maybe let's try being a bit more explicit for now, just getting a specific action_gradient.
See self.action_grads in http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
There was a problem hiding this comment.
Let's move all the tensorflow specific things out of this class.
python/ray/rllib/ddpg/models.py
Outdated
There was a problem hiding this comment.
Let's move this back to slim and preferably as a Model we can import from ModelCatalog
python/ray/rllib/utils/sampler.py
Outdated
There was a problem hiding this comment.
remind me again why we are changing terminal to dones?
There was a problem hiding this comment.
I did it for convenience because LocalSyncReplayOptimizer uses "obs", "new_obs", and "dones". I can change it back if necessary.
There was a problem hiding this comment.
I made this consistent across all algorithms that use this sampler.
There was a problem hiding this comment.
this ends up being a little verbose; consider
self.model = DDPGModel(
self.registry, self.env, ... )
There was a problem hiding this comment.
I think that if I don't do this, the model and target_model end up sharing weights?
python/ray/rllib/ddpg/ddpg.py
Outdated
There was a problem hiding this comment.
would you want to keep this as a running average?
There was a problem hiding this comment.
Is episode_reward_mean in TrainingResult meant to give an average from the start of time?
There was a problem hiding this comment.
It will give an average over the number of episodes finished since the last TrainingResult.
I think here we should just do a lot of optimizer steps and return a TrainingResult only after a while.
There was a problem hiding this comment.
I think we should move all of the Tensorflow stuff out of this class; it would make it a lot easier for us to support PyTorch.
|
Test FAILed. |
|
Test FAILed. |
|
retest please |
|
retest this please |
|
Test FAILed. |
|
Test FAILed. |
python/ray/rllib/ddpg/models.py
Outdated
There was a problem hiding this comment.
** a note throughout- can we not do this style of indentation? and change it to
net = slim.fully_connected(
obs, 400, activation_fn=tf.nn.relu, weights_initializer=w_normal)
python/ray/rllib/ddpg/models.py
Outdated
There was a problem hiding this comment.
can we move this into the models directory?
python/ray/rllib/ddpg/models.py
Outdated
There was a problem hiding this comment.
formatting for this is also a little odd; would be great to fix this.
You can reduce verbosity by just setting tau = self.config["tau"]
python/ray/rllib/ddpg/models.py
Outdated
There was a problem hiding this comment.
see ** a note throughout
|
The graph is without batchnorm? |
|
Test FAILed. |
|
retest this please |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
retest this please |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
|
retest this please |
|
Test FAILed. |
|
Test PASSed. |
richardliaw
left a comment
There was a problem hiding this comment.
OK this is looking really good! We have one last thing, which is to add this to our test suite in multi_node_tests.sh and add an example to the regression tests folder.
Otherwise, this is ready to merge (and will probably be done after #1868)
|
|
||
| samples = process_rollout( | ||
| rollout, NoFilter(), | ||
| gamma=1.0, use_gae=False) |
There was a problem hiding this comment.
nit: we should make a comment as to why this gamma (or should not be the gamma from the config)
python/ray/rllib/ddpg/ddpg.py
Outdated
| episode_lengths.append(episode.episode_length) | ||
| episode_rewards.append(episode.episode_reward) | ||
| avg_reward = ( | ||
| np.mean(episode_rewards) if episode_rewards else float('nan')) |
There was a problem hiding this comment.
nit: I believe you can just do np.mean(episode_rewards) and np.sum (below)to the same effect without the if-else cases
python/ray/rllib/ddpg/ddpg.py
Outdated
| # Number of local steps taken for each call to sample | ||
| "num_local_steps": 1, | ||
| # Number of workers (excluding master) | ||
| "num_workers": 1, |
There was a problem hiding this comment.
what happens when num_workers is 0? does the algorith still output anything?
There was a problem hiding this comment.
As of now, it doesn't output anything when num_workers is 0, but I could update the stats so it takes metrics from the local evaluator when there's no remote evaluators.
|
|
||
| def get_weights(self): | ||
| """Returns critic weights, actor weights.""" | ||
| return self.critic_vars.get_weights(), self.actor_vars.get_weights() |
There was a problem hiding this comment.
nit: it would be great if we could just have this as one return dict
There was a problem hiding this comment.
Correct me if I'm wrong, but I think that would make setting weights more annoying?
There was a problem hiding this comment.
That's true but only slightly; the benefit of doing this is to that different optimizers can use the DDPG evaluator (which is one of the main points of RLlib)
There was a problem hiding this comment.
OK, do I also need to put the model and target_model weights in the same dictionary?
There was a problem hiding this comment.
OK let's leave this as is and fix it in a later PR if necessary.
python/ray/rllib/agent.py
Outdated
There was a problem hiding this comment.
@alvkao58 can you take care of these comments by Eric? (see PR)
|
Test FAILed. |
|
Test PASSed. |
* master: (56 commits) [xray] Turn on flushing to the GCS for the lineage cache (ray-project#1907) Single Big Object Parallel Transfer. (ray-project#1827) Remove num_threads as a parameter. (ray-project#1891) Adds Valgrind tests for multi-threaded object manager. (ray-project#1890) Pin cython version in docker base dependencies file. (ray-project#1898) Update arrow to efficiently serialize more types of numpy arrays. (ray-project#1889) updates (ray-project#1896) [DataFrame] Inherit documentation from Pandas (ray-project#1727) Update arrow and parquet-cpp. (ray-project#1875) raylet command line resource configuration plumbing (ray-project#1882) use raylet for remote ray nodes (ray-project#1880) [rllib] Propagate dim option to deepmind wrappers (ray-project#1876) [RLLib] DDPG (ray-project#1685) Lint Python files with Yapf (ray-project#1872) [DataFrame] Fixed repr, info, and memory_usage (ray-project#1874) Fix getattr compat (ray-project#1871) check if arrow build dir exists (ray-project#1863) [DataFrame] Encapsulate index and lengths into separate class (ray-project#1849) [DataFrame] Implemented __getattr__ (ray-project#1753) Add better analytics to docs (ray-project#1854) ... # Conflicts: # python/ray/rllib/__init__.py # python/setup.py


Implemented DDPG Algorithm.