-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
#6270 exposed a new HTTP API, enabled by default. Copied from #6270 (comment):
I'm concerned about a security regression here. By default, this is opening up an API that allows anyone to change cluster state (via retire_workers currently, but I imagine other things might be added someday too).
Prior to this, the only way to do things that affected cluster state was through the client. All the HTTP routes were effectively read-only. (Whether there is a vulnerability in the bokeh dashboard is another topic; it's pretty possible there is, but I'm just talking here in principle.)
I think it's rather common to expose the HTTP routes to the public internet. For example, I believe dask-cloudprovider does this:
By default a Dask security group will be created with ports 8786 and 8787 exposed to the internet https://cloudprovider.dask.org/en/latest/aws.html#dask_cloudprovider.aws.EC2Cluster
You want those ports exposed for convenience, so you can connect to them. But you don't want anyone to be able to do stuff to the cluster, so you set up TLS using temporary credentials. dask-cloudprovider does this for you as well:
When a cluster is launched with any of these cluster managers a set of temporary keys will be generated and distributed to the cluster nodes via their startup script. All communication between the client, scheduler and workers will then be encrypted and only clients and workers with valid certificates will be able to connect to the scheduler.
https://cloudprovider.dask.org/en/latest/security.html#authentication-and-encryption
Currently, if you set up TLS for your cluster, this is mTLS, meaning the scheduler verifies the client's certificate (docs, code). This serves as a form of authentication and authorization: if you've set up cluster security, you can only tell the scheduler to do things if you hold a valid certificate.
However, the HTTP routes have no authentication (they use standard TLS, not mTLS, because mTLS would be very inconvenient when you want to look at the dashboard with a web browser).
So after this change, someone who had gone to the trouble to set up mTLS for their cluster (or was using the defaults of their cluster deployment system) would, by default, have an unauthenticated endpoint running that allowed anyone with access to :8787 (aka the dashboard) to affect cluster state.
I think we should do two things:
- Short-term: disable the HTTP API if TLS is specified on the scheduler. This is a reasonable default.
- Long-term: figure out a security posture and authentication for the HTTP API that's consistent with the security posture of other things that can affect cluster state (aka the client).