https://github.com/benchmark-action/github-action-benchmark/pull/196 The "flaky" tests are nearly consistent now. Still failing after several retries. Need to fix them