I will be opening a series of PRs to optimize PromQL "join" (binary vector operations, AND, OR, UNLESS) performance. Opening this issue to make tracking of this work easier. For each optimization PR, I will be adding a comment here, with cumulative performance improvement (relative to master), while each individual PR will contain incremental benchmark results (relative to previous PR).
PromQL join performance heavily depends on the length of label sets, and in practice we see label sets that are at least hundreds of bytes long. Therefore I am using a new benchmark that contains more labels (#17130) for evaluating join performance.