-
-
Notifications
You must be signed in to change notification settings - Fork 757
Closed
Description
I'm trying to start a dask cluster on my local machine. I can use the LocalCluster just fine, but starting the scheduler and client from the command line I run into issues:
$ dask-scheduler
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://145.90.225.10:8786
distributed.scheduler - INFO - bokeh at: :8787
distributed.scheduler - INFO - Local Directory: /var/folders/h6/ck24x_854wd94jzwy9r7gvl40000gn/T/scheduler-gjtgzvz2
distributed.scheduler - INFO - -----------------------------------------------
$ dask-worker 145.90.225.10:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://145.90.225.10:51273'
distributed.worker - INFO - Start worker at: tcp://145.90.225.10:51274
distributed.worker - INFO - Listening to: tcp://145.90.225.10:51274
distributed.worker - INFO - nanny at: 145.90.225.10:51273
distributed.worker - INFO - bokeh at: 145.90.225.10:51275
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 8
distributed.worker - INFO - Memory: 17.18 GB
distributed.worker - INFO - Local Directory: /Users/bweel/worker-73osop8n
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
distributed.worker - INFO - Waiting to connect to: tcp://145.90.225.10:8786
...etc
I have the feeling this has to do with having multiple interfaces and the nanny process. The following combination works on the command line:
$ dask-scheduler --interface en0
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://145.100.116.139:8786
distributed.scheduler - INFO - bokeh at: 145.100.116.139:8787
distributed.scheduler - INFO - Local Directory: /var/folders/h6/ck24x_854wd94jzwy9r7gvl40000gn/T/scheduler-3evaz473
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Register tcp://145.100.116.139:62873
distributed.scheduler - INFO - Starting worker compute stream, tcp://145.100.116.139:62873
distributed.core - INFO - Starting established connection
$ dask-worker 145.100.116.139:8786 --interface en0 --no-nanny
distributed.diskutils - INFO - Found stale lock file and directory '/Users/bweel/worker-o4mon_8t', purging
distributed.worker - INFO - Start worker at: tcp://145.100.116.139:62873
distributed.worker - INFO - Listening to: tcp://145.100.116.139:62873
distributed.worker - INFO - bokeh at: 145.100.116.139:62874
distributed.worker - INFO - Waiting to connect to: tcp://145.100.116.139:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 8
distributed.worker - INFO - Memory: 17.18 GB
distributed.worker - INFO - Local Directory: /Users/bweel/worker-br866x7c
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://145.100.116.139:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
However, now I cannot connect to the scheduler from ipython:
from dask.distributed import Client
client = Client('tcp://145.100.116.139:8786')
---------------------------------------------------------------------------
TimeoutError Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, connection_args)
203 future,
--> 204 quiet_exceptions=EnvironmentError)
205 except FatalCommClosedError:
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1054 try:
-> 1055 value = future.result()
1056 except Exception:
/usr/local/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
237 try:
--> 238 raise_exc_info(self._exc_info)
239 finally:
/usr/local/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
TimeoutError: Timeout
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-7-2c6e48226add> in <module>()
----> 1 client = Client('tcp://145.100.116.139:8786')
/usr/local/lib/python3.6/site-packages/distributed/client.py in __init__(self, address, loop, timeout, set_as_default, scheduler_file, security, asynchronous, name, heartbeat_interval, serializers, deserializers, extensions, direct_to_workers, **kwargs)
636 ext(self)
637
--> 638 self.start(timeout=timeout)
639
640 from distributed.recreate_exceptions import ReplayExceptionClient
/usr/local/lib/python3.6/site-packages/distributed/client.py in start(self, **kwargs)
759 self._started = self._start(**kwargs)
760 else:
--> 761 sync(self.loop, self._start, **kwargs)
762
763 def __await__(self):
/usr/local/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
275 e.wait(10)
276 if error[0]:
--> 277 six.reraise(*error[0])
278 else:
279 return result[0]
/usr/local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
684 if value.__traceback__ is not tb:
685 raise value.with_traceback(tb)
--> 686 raise value
687
688 else:
/usr/local/lib/python3.6/site-packages/distributed/utils.py in f()
260 if timeout is not None:
261 future = gen.with_timeout(timedelta(seconds=timeout), future)
--> 262 result[0] = yield future
263 except Exception as exc:
264 error[0] = sys.exc_info()
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1053
1054 try:
-> 1055 value = future.result()
1056 except Exception:
1057 self.had_exception = True
/usr/local/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
236 if self._exc_info is not None:
237 try:
--> 238 raise_exc_info(self._exc_info)
239 finally:
240 self = None
/usr/local/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1061 if exc_info is not None:
1062 try:
-> 1063 yielded = self.gen.throw(*exc_info)
1064 finally:
1065 # Break up a reference to itself
/usr/local/lib/python3.6/site-packages/distributed/client.py in _start(self, timeout, **kwargs)
847 self.scheduler_comm = None
848
--> 849 yield self._ensure_connected(timeout=timeout)
850
851 for pc in self._periodic_callbacks.values():
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1053
1054 try:
-> 1055 value = future.result()
1056 except Exception:
1057 self.had_exception = True
/usr/local/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
236 if self._exc_info is not None:
237 try:
--> 238 raise_exc_info(self._exc_info)
239 finally:
240 self = None
/usr/local/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1061 if exc_info is not None:
1062 try:
-> 1063 yielded = self.gen.throw(*exc_info)
1064 finally:
1065 # Break up a reference to itself
/usr/local/lib/python3.6/site-packages/distributed/client.py in _ensure_connected(self, timeout)
885 try:
886 comm = yield connect(self.scheduler.address, timeout=timeout,
--> 887 connection_args=self.connection_args)
888 if timeout is not None:
889 yield gen.with_timeout(timedelta(seconds=timeout),
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1053
1054 try:
-> 1055 value = future.result()
1056 except Exception:
1057 self.had_exception = True
/usr/local/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
236 if self._exc_info is not None:
237 try:
--> 238 raise_exc_info(self._exc_info)
239 finally:
240 self = None
/usr/local/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
/usr/local/lib/python3.6/site-packages/tornado/gen.py in run(self)
1061 if exc_info is not None:
1062 try:
-> 1063 yielded = self.gen.throw(*exc_info)
1064 finally:
1065 # Break up a reference to itself
/usr/local/lib/python3.6/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, connection_args)
213 _raise(error)
214 except gen.TimeoutError:
--> 215 _raise(error)
216 else:
217 break
/usr/local/lib/python3.6/site-packages/distributed/comm/core.py in _raise(error)
193 msg = ("Timed out trying to connect to %r after %s s: %s"
194 % (addr, timeout, error))
--> 195 raise IOError(msg)
196
197 # This starts a thread
OSError: Timed out trying to connect to 'tcp://145.100.116.139:8786' after 10 s: connect() didn't finish in time
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels