Add basic JupyterHub + Dask deployment#57
Conversation
| name: daskdev/dask-notebook | ||
| tag: latest | ||
| extraEnv: | ||
| EXTRA_PIP_PACKAGES: gcsfs xarray |
There was a problem hiding this comment.
How hard would it be to put xarray-master in here instead? Also, if we can the optional xarray dependencies (scipy/netcdf4/zarr), that will allow us to start testing a few workflows.
There was a problem hiding this comment.
We can anything that is pip or conda installable. Are there particular branches that you'd like us to target?
There was a problem hiding this comment.
I would also be quite happy to walk you through how to manipulate the cluster if you're interested. Changing this stuff is surprisingly easy.
There was a problem hiding this comment.
Conda installing scipy/netcdf4 each time is taking a while. I'm building and pushing new docker images.
There was a problem hiding this comment.
I've created new docker images and included the files in this PR. We now have scipy and netCDF4 installed in the image and pip install gcsfs, xarray, and zarr from git master on each image load.
I'm happy to wait before merging. |
|
OK, for fuse it looks like the docker container needs to be run with elevated priveleges. Locally I do the following: How do I specify flags like |
| @@ -0,0 +1,30 @@ | |||
| # Start cluster on Google cloud | |||
| gcloud container clusters create pangeo --num-nodes=10 --machine-type=n1-standard-2 --zone=us-central1-b --cluster-version=1.8.4-gke.1 | |||
| gcloud container clusters get-credentials pangeo --zone us-central1-b --project pangeo-181919 | |||
There was a problem hiding this comment.
At some point, using preemptible VMs might save us lots of money
https://cloud.google.com/kubernetes-engine/docs/concepts/preemptible-vm
There was a problem hiding this comment.
I agree. There is some delay in launching new nodes, but generally I agree that long term this is a good strategy. Short term I'm just leaving a few nodes on for a while. I'm judging that the cost in expense is worth it to reduce friction when getting started. Please let me know if you feel differently and I can adjust.
| gcloud container clusters get-credentials pangeo --zone us-central1-b --project pangeo-181919 | ||
|
|
||
| # Set up Kubernetes | ||
| kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=mrocklinWgmail.com |
| xarray \ | ||
| netcdf4 \ | ||
| scipy \ | ||
| && conda clean -tipsy |
There was a problem hiding this comment.
Maybe we should be using a standard environment file here?
There was a problem hiding this comment.
Sure, I would be fine with that
| clientSecret: SECRET | ||
| callbackUrl: "http://pangeo.pydata.org/hub/oauth_callback" | ||
|
|
||
| admin: |
There was a problem hiding this comment.
This needs to be under 'auth' for it to take effect.
This no longer relies on the dask-foo images. Fuse now works on the notebook and worker image Adds .daskernetes.yaml file to define workers
| COPY prepare.sh /usr/bin/prepare.sh | ||
| RUN chmod +x /usr/bin/prepare.sh | ||
| RUN mkdir /home/$NB_USER/examples && chown -R $NB_USER /home/$NB_USER/examples | ||
| COPY examples/ /home/$NB_USER/examples |
There was a problem hiding this comment.
@yuvipanda will this work or will the user's home directory be overridden by jupyterhub?
Also add a --death-timeout for the workers to clean them up
|
It will be overwritten. You can put them in a non home directory and copy
them post start. Or you can turn off persistent storage completely and that
will make these show up.
…On Jan 6, 2018 1:57 PM, "Matthew Rocklin" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In gce/notebook/Dockerfile
<#57 (comment)>:
> +RUN jupyter labextension install @***@***.***
+
+RUN pip install click jedi kubernetes --upgrade
+
+RUN pip install git+https://github.com/yuvipanda/daskernetes \
+ git+https://github.com/zarr-developers/zarr \
+ ***@***.***/zarr_set_attrs \
+ ***@***.*** \
+ fusepy
+
+USER root
+RUN mkdir /gcs && chown -R $NB_USER /gcs
+COPY prepare.sh /usr/bin/prepare.sh
+RUN chmod +x /usr/bin/prepare.sh
+RUN mkdir /home/$NB_USER/examples && chown -R $NB_USER /home/$NB_USER/examples
+COPY examples/ /home/$NB_USER/examples
@yuvipanda <https://github.com/yuvipanda> will this work or will the
user's home directory be overridden by jupyterhub?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB23hZ3vJoxM9BU15M-kEoV3P8W_cU6ks5tH-xXgaJpZM4RNCEz>
.
|
|
Persistent storage seems pretty valuable. I'll copy them over in the
startup script. Thanks!
…On Sat, Jan 6, 2018 at 4:20 PM, Yuvi Panda ***@***.***> wrote:
It will be overwritten. You can put them in a non home directory and copy
them post start. Or you can turn off persistent storage completely and that
will make these show up.
On Jan 6, 2018 1:57 PM, "Matthew Rocklin" ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In gce/notebook/Dockerfile
> <#57 (comment)>:
>
> > +RUN jupyter labextension install @jupyter-widgets/jupyterlab-
***@***.***
> +
> +RUN pip install click jedi kubernetes --upgrade
> +
> +RUN pip install git+https://github.com/yuvipanda/daskernetes \
> + git+https://github.com/zarr-developers/zarr \
> + ***@***.***/zarr_set_attrs \
> + ***@***.*** \
> + fusepy
> +
> +USER root
> +RUN mkdir /gcs && chown -R $NB_USER /gcs
> +COPY prepare.sh /usr/bin/prepare.sh
> +RUN chmod +x /usr/bin/prepare.sh
> +RUN mkdir /home/$NB_USER/examples && chown -R $NB_USER
/home/$NB_USER/examples
> +COPY examples/ /home/$NB_USER/examples
>
> @yuvipanda <https://github.com/yuvipanda> will this work or will the
> user's home directory be overridden by jupyterhub?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#57#
pullrequestreview-87085455>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAB23hZ3vJoxM9BU15M-
kEoV3P8W_cU6ks5tH-xXgaJpZM4RNCEz>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszNroFHntURGJNLi6vhUgDzVnS-sCks5tH_GlgaJpZM4RNCEz>
.
|
|
Things from this we have to upstream (from looking at customizations we had to perform with modify_pod_hook):
I'll file bugs in z2jh for these and link from here appropriately. |
|
Should we consider merging this? |
|
Unless there is any opposition, I'll merge this tomorrow afternoon. |
This is a vanilla JupyterHub deployment alongside a vanilla Dask deployment . It includes instructions and helm config files to set things up.