Skip to content

[tune/autoscaler] _LogSyncer cannot rsync with Docker #4403

@AdamGleave

Description

@AdamGleave

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Ray installed from (source or binary): binary
  • Ray version: 0.6.4
  • Python version: 3.6.7

Describe the problem

In _LogSyncer, sync_to_worker_if_possible and sync_now use rsync to transfer logs between the local node and the worker. This breaks when using Docker, since:

  • If the local node is in Docker, it will typically have the root username, and so this is what get_ssh_user will return. But we cannot typically login to the worker node as root.
  • The local_dir on the worker is inside the Docker container, and may not even be visible outside. If it is bound, then it will typically be at a different path.

An unrelated issue: if self.sync_func is non-None, it will get executed before the worker_to_local_sync_cmd, which I think is wrong.

I'd be happy to make a stab at a PR, but I'd appreciate some suggestions on the right way of fixing this, as it's been a while since I've looked at Ray internals. This also feels like a problem that is likely to reoccur with slight variation, e.g. this bug is similar to #4183

Perhaps we can make autoscaler provide an abstract sync interface that tune and other consumers can use. This could make to rsync in the standard case, and something more complex in the Docker case (e.g. Docker cp followed by rsync)? ray.autoscaller.commands.rsync is already something along these lines -- would this be an appropriate place to modify?

A more hacky solution would be to make get_ssh_user return the right value and make the Docker volume-binding line up so that we can just ignore the difference between Docker and non-Docker instances.

Source code / logs

A MWE for this is hard to provide, but if the above description is insufficient I can try to come up with one.

Metadata

Metadata

Assignees

Labels

tuneTune-related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions