Skip to content

Fea/add proxying#2715

Merged
mrocklin merged 22 commits intodask:masterfrom
quasiben:fea/add_proxying
May 22, 2019
Merged

Fea/add proxying#2715
mrocklin merged 22 commits intodask:masterfrom
quasiben:fea/add_proxying

Conversation

@quasiben
Copy link
Copy Markdown
Member

PR resolves #2160

This PR is largely done but I left some code in because I was somewhat undecided on where proxying should occur. At first, I built a new proxying service but soon realized that the full service was unnecessary and could just run as another route in on the scheduler's dashboard. If reviewers are ok with running the proxying as another route I can remove all of distributed/proxy/core.py and the Proxy object in proxy_html.py.

How it Works:

  • uses jupyter_server_proxy (still needs to be added as an optional dependency and build proper try/except on import)
  • encodes host address of the worker as a URL param
  • proxy checks hosts is valid then proxies to the host and port of the worker dashboard

@yuvipanda if you have time, this would work would greatly benefit from a review from you

Copy link
Copy Markdown
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I left some code in because I was somewhat undecided on where proxying should occur. At first, I built a new proxying service but soon realized that the full service was unnecessary and could just run as another route in on the scheduler's dashboard. If reviewers are ok with running the proxying as another route I can remove all of distributed/proxy/core.py and the Proxy object in proxy_html.py.

+1 on running this as an additional route inside of our existing Tornado HTTP server

@mrocklin
Copy link
Copy Markdown
Member

Also thanks for doing this! I'm excited about getting access to these dashboards. I also think that once we have them more publicly available we'll probably start investing more time in adding to them.

@mrocklin
Copy link
Copy Markdown
Member

cc @jcrist who might find this interesting as well

<td> {{ len(ws.has_what) }} </td>
{% if 'bokeh' in ws.services %}
<td> <a href="http://{{ ws.host }}:{{ ws.services['bokeh'] }}">bokeh</a> </td>
<td> <a href="http://{{ ws.host }}:{{ scheduler.services['bokeh'].port }}/proxy/{{ ws.services['bokeh'] }}/{{ ws.host }}/status">bokeh</a> </td>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin This is the new url: /proxy/{port}/{host}/{proxied_path}.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, how did you manage that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin I realized that I could manually overwrite the request uri (something I always had trouble with using ngxin) and a bit of regex in the handler made things smooth for having nicer URLs

Ah. Great!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool!

Is ws.host coming from a trusted source? If it is user controlled, it could be used maliciously to bypass CORS / Origin checking.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the address that a Dask worker advertises as its contact address. A user that has access to launch a Dask worker and connect it to this scheduler can manipulate this value, yes.

However, such a user already has the ability to connec to the scheduler and run arbitrary Python code there, or on any other worker-connected machine, so I suspect that at this point we're ok with this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the introduction of scheduler here we need to include that all uses of this template also include the scheduler= keyword,

class Worker(RequestHandler):
    def get(self, worker):
        worker = escape.url_unescape(worker)
        with log_errors():
            self.render(
                "worker.html",
                title="Worker: " + worker,
                Worker=worker,
+               scheduler=self.server,
                **toolz.merge(self.server.__dict__, ns, self.extra)
            )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that should have shown up on tests ideally. If you felt the desire, it might be nice to add this to some test that checks the info pages (assuming that something like this already exists)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my last commit, I fixed the worker and task handlers. The test were more or less already written in test_scheduler_bokeh_html.py . The tests previously passed because the workers did not setup bokeh dashboards. Now they do and tests pass again with the suggested addition above

@quasiben
Copy link
Copy Markdown
Member Author

@mrocklin I realized that I could manually overwrite the request uri (something I always had trouble with using ngxin) and a bit of regex in the handler made things smooth for having nicer URLs

@quasiben quasiben changed the title [WIP] Fea/add proxying Fea/add proxying May 21, 2019
assert response_proxy.code == 200
assert b"Crossfilter" in response_proxy.body
assert response_direct.code == 200
assert b"Crossfilter" in response_direct.body
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@quasiben
Copy link
Copy Markdown
Member Author

Thanks for the review @mrocklin I think I've address all of your comments. Another thought occurring to me as I've just added jupyter-server-proxy as a dependency is that it increases the size of distributed a bit. jupyter-server-proxy has a hard dependency on notebook among others: https://github.com/jupyterhub/jupyter-server-proxy/blob/2726bca233af8f30e4b230b3ffce895fb1fa9b2e/setup.py#L13

For context, notebook is ~7MBs

@mrocklin
Copy link
Copy Markdown
Member

Another thought occurring to me as I've just added jupyter-server-proxy as a dependency is that it increases the size of distributed a bit.

Then maybe we should make this optional. Presumably we would try-except around the internal import of our own module when adding the route.

@mrocklin
Copy link
Copy Markdown
Member

Then maybe we should make this optional. Presumably we would try-except around the internal import of our own module when adding the route.

Or maybe our proxy handler sends back a "Oops, couldn't proxy, please pip install jupyter-server-proxy" message (or something nicer)

quasiben and others added 2 commits May 21, 2019 11:15
@yuvipanda
Copy link
Copy Markdown

ProxyHandler also inherits from IPythonHandler, which probably has implications around authentication - IIRC, we currently require / depend on notebook authentication to allow access.

@quasiben
Copy link
Copy Markdown
Member Author

@yuvipanda that's a good point! I think, though could be wrong, that as long as dask internals don't implement a LoginHandler/self.login_handler we should be ok. tornado.web.authenticated is going to eventually ask for self.current_user:

https://github.com/tornadoweb/tornado/blob/f65b1802a549bab73b4a6721bbfc47f82954d156/tornado/web.py#L3157-L3175

Fortunately for us the, the AuthenticatedHandler which is the parent of IPythonHandler passes anonymous when login_handler is None:

https://github.com/jupyter/notebook/blob/9425250005d46c9fb0057b0c322e16d07125b003/notebook/base/handlers.py#L136-L139

quasiben and others added 7 commits May 21, 2019 22:20
… to include bokeh for workers to properly test routes
We would have needed the scheduler's bokeh address instead, but
relative is probably safer regardless?  It's also the convention used
within this template, so lets stick to that for conformity and future
reviewers..
@mrocklin
Copy link
Copy Markdown
Member

@quasiben is it ok to rename proxy_html.py to proxy.py ?

@quasiben
Copy link
Copy Markdown
Member Author

👍

<td> {{ len(ws.has_what) }} </td>
{% if 'bokeh' in ws.services %}
<td> <a href="http://{{ ws.host }}:{{ scheduler.services['bokeh'].port }}/proxy/{{ ws.services['bokeh'] }}/{{ ws.host }}/status">bokeh</a> </td>
<td> <a href="../../proxy/{{ ws.services['bokeh'] }}/{{ ws.host }}/status">bokeh</a> </td>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@mrocklin
Copy link
Copy Markdown
Member

OK, done. I plan to merge this after tests run. I'll rename bokeh to dashboard shortly after

@quasiben
Copy link
Copy Markdown
Member Author

thanks for the edits to scheduler.py

@mrocklin
Copy link
Copy Markdown
Member

My pleasure. Dashboard work is particularly satisfying.

@mrocklin mrocklin merged commit 28ce1ed into dask:master May 22, 2019
@mrocklin
Copy link
Copy Markdown
Member

This is in. Thanks @quasiben !

@yuvipanda
Copy link
Copy Markdown

<3 great to see jupyter-server-proxy get used this way! \o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reuse nbserverproxy for Dask's dashboards

3 participants