If a workflow is struggling, there are often a number of plausible fixes, including:
- Changing chunk/partition sizes
- Increasing cluster size
- Choosing a bigger instance type.
We don't always have strong guidance for users about what to choose in this case. It would be good to include some tests parameterized across these dimensions to better inform coiled users. I know the @gjoseph92 has some work-in-progress towards that end, and @ntabris has expressed interest.