Skip to content

GH-524 staging environment#532

Merged
alabdao merged 6 commits intomainfrom
ops/524-staging-environment
Jul 21, 2023
Merged

GH-524 staging environment#532
alabdao merged 6 commits intomainfrom
ops/524-staging-environment

Conversation

@alabdao
Copy link
Copy Markdown
Contributor

@alabdao alabdao commented Jul 20, 2023

Split out docker setup for potentially used on other nodes.

Putting custom vars into extra-vars file to be referenced on cli
while executing script.

Fixes #524

hevans66 and others added 4 commits July 18, 2023 15:22
Split out docker setup for potentially used on other nodes.

Putting custom vars into extra-vars file to be referenced on cli
while executing script.
@vercel
Copy link
Copy Markdown

vercel bot commented Jul 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 21, 2023 2:36pm

@alabdao alabdao temporarily deployed to ci July 20, 2023 21:26 — with GitHub Actions Inactive
@alabdao alabdao requested a review from hevans66 July 20, 2023 21:27

# slow_start = 60

# TODO: need to figure out healthcheck for IPFS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a rabbit hole I started down at some point.

Copy link
Copy Markdown
Contributor

@hevans66 hevans66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great its certainly a more complete set up than we had before.

Coupe of notes:

  • This does not set up a receptor instance for staging. I think that's ok for now, but I think as soon as the receptor starts doing fancier things (like actually rejecting jobs based on criteria) we will want to add a receptor for testing purposes.
  • I'm a little worried about the requester instances having an auto scaling group. Mostly because I don't really know what it means for two bacalhau requester nodes (that are not peered together)
    to share the same set of compute nodes. What would happen if one requester node accepts a job and hands it off to a compute node, then the cli later requests the job status and that request goes to a different requester node? The requester node is not doing any computation so I don't anticipate ever really needing more than one. Unless the idea is to never n > 1 for this asg.
  • This still won't automatically run the ansible provision scripts when an instance launches right? For now were still running ansible-playbook from command line?

@@ -1,109 +1,67 @@
- name: Provision Bacalhau Compute Instance
remote_user: ubuntu
hosts: tag_Type_compute_only:&tag_Env_prod
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you limiting to specific Env's when running ansible-playbook? Or is there some magic I am missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah using --limit tag_Env_staging while executing ansible-playbook command.

@alabdao
Copy link
Copy Markdown
Contributor Author

alabdao commented Jul 21, 2023

This is looking great its certainly a more complete set up than we had before.

Coupe of notes:

* This does not set up a receptor instance for staging. I think that's ok for now, but I think as soon as the receptor starts doing fancier things (like actually rejecting jobs based on criteria) we will want to add a receptor for testing purposes.

Yes recepter will definitely come later.

* I'm a little worried about the requester instances having an auto scaling group. Mostly because I don't really know what it means for two bacalhau requester nodes (that are not peered together)
  to share the same set of compute nodes. What would happen if one requester node accepts a job and hands it off to a compute node, then the cli later requests the job status and that request goes to a different requester node? The requester node is not doing any computation so I don't anticipate ever really needing more than one. Unless the idea is to never n > 1 for this asg.
  • Need to ASG is to have HA capability in case of node becoming unhealthy. Work needs to be done to make Bacalhau and IPFS state to be preservable probably EFS would do the job.
  • Needed LB target to terminate public traffic. Step towards having nodes NOT having public IP addresses being accessed via bastian/VPN/EC2 Instance Connect Endpoint.
  • Easier bootstrapping mecnism.
  • At some point probably have multiple requesters with Stickness enabled so client hits the same node.
* This still won't automatically run the ansible provision scripts when an instance launches right? For now were still running ansible-playbook from command line?
  • Yes correct. This is step towards that. Having compute nodes being completely dynamic with headless setup.

@alabdao alabdao temporarily deployed to ci July 21, 2023 14:34 — with GitHub Actions Inactive
@alabdao alabdao temporarily deployed to ci July 21, 2023 14:35 — with GitHub Actions Inactive
@alabdao alabdao merged commit 9538de2 into main Jul 21, 2023
@alabdao alabdao deleted the ops/524-staging-environment branch July 21, 2023 16:04
Copy link
Copy Markdown

@thetechnocrat-dev thetechnocrat-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I'll defer to @hevans66 on the final approval. I do think the large upcoming strategic decision that is coming up is, how we organize the infrastructure code for the private versus public clusters.

ipfs_path: /opt/local/ipfs
tasks:
# Must provide limit flag to ensure running against current environment
- fail:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, didn't know about this trick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setup Plex Staging environment

3 participants