Skip to content

Add basic JupyterHub + Dask deployment#57

Merged
mrocklin merged 20 commits intopangeo-data:masterfrom
mrocklin:gce
Jan 19, 2018
Merged

Add basic JupyterHub + Dask deployment#57
mrocklin merged 20 commits intopangeo-data:masterfrom
mrocklin:gce

Conversation

@mrocklin
Copy link
Copy Markdown
Member

This is a vanilla JupyterHub deployment alongside a vanilla Dask deployment . It includes instructions and helm config files to set things up.

Comment thread gce/jupyter-config.yaml Outdated
name: daskdev/dask-notebook
tag: latest
extraEnv:
EXTRA_PIP_PACKAGES: gcsfs xarray
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to put xarray-master in here instead? Also, if we can the optional xarray dependencies (scipy/netcdf4/zarr), that will allow us to start testing a few workflows.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can anything that is pip or conda installable. Are there particular branches that you'd like us to target?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be quite happy to walk you through how to manipulate the cluster if you're interested. Changing this stuff is surprisingly easy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conda installing scipy/netcdf4 each time is taking a while. I'm building and pushing new docker images.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created new docker images and included the files in this PR. We now have scipy and netCDF4 installed in the image and pip install gcsfs, xarray, and zarr from git master on each image load.

Copy link
Copy Markdown
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin - What do you want to do here? Should we merge this and then incrementally add functionality to these configs? Autoscaling would be an obvious feature to work in but that doesn't have to happen immediately.

@mrocklin
Copy link
Copy Markdown
Member Author

mrocklin commented Jan 5, 2018

@mrocklin - What do you want to do here? Should we merge this and then incrementally add functionality to these configs? Autoscaling would be an obvious feature to work in but that doesn't have to happen immediately.

I'm happy to wait before merging.

@mrocklin
Copy link
Copy Markdown
Member Author

mrocklin commented Jan 5, 2018

OK, for fuse it looks like the docker container needs to be run with elevated priveleges. Locally I do the following:

docker run -it --device /dev/fuse --cap-add SYS_ADMIN --privileged daskdev/pangeo-notebook

How do I specify flags like --device /dev/fuse --cap-add SYS_ADMIN --privileged in the config.yaml file? cc @yuvipanda

Comment thread gce/setup.sh
@@ -0,0 +1,30 @@
# Start cluster on Google cloud
gcloud container clusters create pangeo --num-nodes=10 --machine-type=n1-standard-2 --zone=us-central1-b --cluster-version=1.8.4-gke.1
gcloud container clusters get-credentials pangeo --zone us-central1-b --project pangeo-181919
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, using preemptible VMs might save us lots of money
https://cloud.google.com/kubernetes-engine/docs/concepts/preemptible-vm

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. There is some delay in launching new nodes, but generally I agree that long term this is a good strategy. Short term I'm just leaving a few nodes on for a while. I'm judging that the cost in expense is worth it to reduce friction when getting started. Please let me know if you feel differently and I can adjust.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now.

Comment thread gce/setup.sh Outdated
gcloud container clusters get-credentials pangeo --zone us-central1-b --project pangeo-181919

# Set up Kubernetes
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=mrocklinWgmail.com
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in email address?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Comment thread gce/notebook/Dockerfile
xarray \
netcdf4 \
scipy \
&& conda clean -tipsy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should be using a standard environment file here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I would be fine with that

Comment thread gce/jupyter-config.yaml Outdated
clientSecret: SECRET
callbackUrl: "http://pangeo.pydata.org/hub/oauth_callback"

admin:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be under 'auth' for it to take effect.

This no longer relies on the dask-foo images.
Fuse now works on the notebook and worker image
Adds .daskernetes.yaml file to define workers
Comment thread gce/notebook/Dockerfile Outdated
COPY prepare.sh /usr/bin/prepare.sh
RUN chmod +x /usr/bin/prepare.sh
RUN mkdir /home/$NB_USER/examples && chown -R $NB_USER /home/$NB_USER/examples
COPY examples/ /home/$NB_USER/examples
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuvipanda will this work or will the user's home directory be overridden by jupyterhub?

Also add a --death-timeout for the workers to clean them up
@yuvipanda
Copy link
Copy Markdown
Member

yuvipanda commented Jan 6, 2018 via email

@mrocklin
Copy link
Copy Markdown
Member Author

mrocklin commented Jan 6, 2018 via email

@yuvipanda
Copy link
Copy Markdown
Member

yuvipanda commented Jan 12, 2018

Things from this we have to upstream (from looking at customizations we had to perform with modify_pod_hook):

  1. Privileged notebook access (Privileged notebook containers jupyterhub/zero-to-jupyterhub-k8s#300) (although you probably want FlexVolume or a CSI provider in the long term, see the linked issue)
  2. Notebook capabilities add / remove
  3. Set default URL for launching (allows us to set JupyterLab instead of notebook as default) (Allow making JupyterLab default thing to launch jupyterhub/zero-to-jupyterhub-k8s#375)
  4. Set service account to be used (allow setting service account from config.ayml jupyterhub/zero-to-jupyterhub-k8s#404)

I'll file bugs in z2jh for these and link from here appropriately.

@rabernat
Copy link
Copy Markdown
Member

Should we consider merging this?

@jhamman
Copy link
Copy Markdown
Member

jhamman commented Jan 19, 2018

Unless there is any opposition, I'll merge this tomorrow afternoon.

@mrocklin mrocklin merged commit 7171fd3 into pangeo-data:master Jan 19, 2018
@mrocklin mrocklin deleted the gce branch January 19, 2018 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants