intervalccl: make OverlapCoveringMerge O(n log m) instead of O(nm)

Instead of scanning each of the input coverings for each output range, use a heap. In practice this shouldn't matter much for either of the current uses of OverlapCoveringMerge (which both have reasonably low numbers of input coverings), but we may as well cover the worst case.

This should also be very benchmark-driven, in case the priority queue makes it slower for small inputs