A cluster recently experienced an incident where seemingly a burst of distsql traffic lead to a spike in CPU utilization which resulting in node liveness heartbeat failures that ultimately left the cluster in a sorry state.
This issue is being left as a placeholder for further details about the specific incident as well as a discussion about ways to mitigate such failures.