Skip to content

Deployment Considerations documentation#9933

Merged
jacobtomlinson merged 8 commits intodask:mainfrom
gjoseph92:deployment-considerations
Feb 13, 2023
Merged

Deployment Considerations documentation#9933
jacobtomlinson merged 8 commits intodask:mainfrom
gjoseph92:deployment-considerations

Conversation

@gjoseph92
Copy link
Copy Markdown
Collaborator

This document tries to cover some of the infrastructure challenges outside of Dask that people commonly run into when setting up serious (production, multi-tenant) Dask deployments. The goal here is to give a more realistic picture of what it takes to run a production-grade Dask deployment to people who might be thinking of setting one up.

This is spun out from https://github.com/dask/dask/pull/9912/files#r1096578227, and based loosely on @mrocklin's PyData NYC talk: https://www.youtube.com/watch?v=5hUkUj1VYW4.

cc @scharlottej13 @jacobtomlinson

@github-actions github-actions bot added the documentation Improve or add to documentation label Feb 8, 2023
@gjoseph92
Copy link
Copy Markdown
Collaborator Author

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the way this has been laid out. It talks through a lot of important factors that advanced users will want to think about. Especially when rolling Dask out within orgs. As you say much of this is out of scope for core Dask.

I like how it starts with the challenges and then neatly directs users off to other projects and companies that solve those problems. This seems like a really great way to direct folks away to other projects and commercial offerings.


Thanks to the efforts of the open-source community, there are tools to deploy Dask :ref:`pretty much anywhere <deployment-options>`—if you can get computers to talk to each other, you can probably turn them into a Dask cluster.

**However, getting Dask running is often not the last step, but the first step.** This document attempts to cover some of the things *outside of Dask* you may have to think about when managing a Dask deployment.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this is clearly setting the stage that these things are out of scope or on the periphery for Dask.

Additional challenges can include getting local packages or scripts onto the cluster (and ensuring they're up to date), as well as packages installed from private Git or PyPI repos.


Observability
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super important, but I feel like this is one of the last things folks think about. I would probably move this below other sections like cost and credentials.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually why I moved it up here; I know people don't usually think about it up front, but I wanted to make it more prominent since it's so important.

Also note that I mixed log retention in with metrics. Maybe those are worth splitting; I think log retention should be quite high (you're really not going to have a good time if you don't even keep logs around), but metrics usually come a bit later in your deployment journey.

- What are we spending it on? (machines, machines that should have been turned off, network egress that shouldn't have happened, etc.)
- Who/what is responsible?

Non-commercial deployment tools generally don't build in this sort of monitoring. Organizations that need it either end up building their own tools, or turning to commercial deployment offerings.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe soften this a bit.

Suggested change
Non-commercial deployment tools generally don't build in this sort of monitoring. Organizations that need it either end up building their own tools, or turning to commercial deployment offerings.
Many deployment tools generally don't build in this sort of monitoring. Organizations that need it either end up building their own tools, or turning to commercial deployment offerings.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd originally written that, but then couldn't think of any non-commercial tools that actually did have built-in capabilities for cost monitoring. Is there something I'm not thinking of?

Arguably Coiled doesn't even do what I've described here. Coiled can tell you how you spent your Coiled bill, but for your AWS bill, you still have to look yourself in the cost explorer (though this is facilitated by tags Coiled adds to all your dask infrastructure).


You may also have other systems on restricted networks that workers need to access to read and write data, or call APIs. Connecting to those networks could add additional complexity.

Some organizations may have additional network security policies, such as requiring all traffic to be encrypted. Dask supports this with :doc:`TLS <tls>`, which requires additional configuration, and managing certificates.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note but Dask Cloud Provider turns this on by default. I wonder if we should do that in more deployment tooling and make it more of an opt out.

@gjoseph92 gjoseph92 changed the title RFC: Deployment Considerations documentation Deployment Considerations documentation Feb 10, 2023
@gjoseph92 gjoseph92 marked this pull request as ready for review February 10, 2023 04:36
@gjoseph92
Copy link
Copy Markdown
Collaborator Author

@jacobtomlinson I think I've addressed your comments!

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @gjoseph92

@jacobtomlinson jacobtomlinson merged commit 07b76c2 into dask:main Feb 13, 2023
@gjoseph92 gjoseph92 deleted the deployment-considerations branch February 13, 2023 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improve or add to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants