Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
| - name: Install Nvidia Container Tookit | ||
| become: yes | ||
| ansible.builtin.apt: | ||
| pkg: | ||
| - nvidia-docker2 | ||
| notify: | ||
| - Restart docker | ||
| when: gpu | ||
|
|
||
| - name: Ensure Nvidia persitence daemon is started | ||
| ansible.builtin.systemd: | ||
| name: nvidia-persistenced | ||
| when: gpu |
There was a problem hiding this comment.
not blocker: the 2 when's can be combined using ansible block
There was a problem hiding this comment.
Happy to combine them, but I had them separated to decouple the Restart docker hook as much as possible. My thinking is since most of the jobs compute is running are docker based we want to avoid restarting docker unnecessarily if possible. Probably a minor optimization in this case though.
There was a problem hiding this comment.
in that case, we can use docker live-restore option. https://docs.docker.com/config/containers/live-restore/
| # Nvidia | ||
| - name: Get Nvidia drivers apt key | ||
| ansible.builtin.get_url: | ||
| url: https://developer.download.nvidia.com/compute/cuda/repos/{{ nvidia_distribution }}/x86_64/cuda-keyring_1.0-1_all.deb | ||
| dest: /tmp/cuda-keyring.deb | ||
| when: gpu | ||
|
|
||
| - name: Add Nvidia Keyring | ||
| become: yes | ||
| ansible.builtin.apt: | ||
| deb: /tmp/cuda-keyring.deb | ||
| when: gpu | ||
|
|
||
| - name: Get Nvidia Container Tookit GPG key | ||
| become: yes | ||
| ansible.builtin.shell: | ||
| cmd: curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --yes --dearmor -o {{ nvidia_container_toolkit_key_path }} | ||
| creates: "{{ nvidia_container_toolkit_key_path }}" | ||
| when: gpu | ||
|
|
||
| - name: Add Nvidia Container Tookit Repository | ||
| become: yes | ||
| ansible.builtin.apt_repository: | ||
| repo: deb [signed-by={{ nvidia_container_toolkit_key_path }}] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) / | ||
| state: present | ||
| when: gpu | ||
|
|
||
| - name: Install required system packages for gpu build | ||
| become: yes | ||
| ansible.builtin.apt: | ||
| pkg: | ||
| - cuda-drivers | ||
| state: latest | ||
| update_cache: true | ||
| when: gpu |
There was a problem hiding this comment.
not blocker: combining block and when.
|
just a general note, again not a blocker. Just good to delete anything that gets downloaded under |
|
since my comments aren't blocking, gonna approve this to move it along. |
👍 💯 agree. I'm gonna merge as is. We can make /tmp/ clean up (and probably disk space monitoring in general) a separate project. |
Separates requester and compute nodes to separate EC2 instances. Currently one requester and one compute instance.
Factors out IPFS install steps into taks file.
This PR creates a new Requester/Compute node, essentially a new Plex instance - running at ec2-18-208-163-46.compute-1.amazonaws.com. I think if were happy with how this is working we can bounce the private ip to this node, then decommission the old compute node in a separate PR.