-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
externalThis issue is unrelated to the STLThis issue is unrelated to the STLresolvedSuccessfully resolved without a commitSuccessfully resolved without a commit
Description
Extracted from #5502 by @AlexGuteniev:
🐛 memmove performance bug
After the initial implementation I observed that some of the benchmark exhibited unexpected slowdown. The issue was in surprisingly slow memmove. I've created benchmark repro of this problem.
Benchmark
#include <benchmark/benchmark.h>
#include <cstring>
using namespace std;
alignas(4096) unsigned char v[1024 * 1024];
void bm_memmove(benchmark::State& state) {
const auto size = static_cast<size_t>(state.range(0));
const auto n = static_cast<ptrdiff_t>(state.range(1));
const size_t n1 = n < 0 ? 0 : n;
const size_t n0 = n < 0 ? -n : 0;
benchmark::DoNotOptimize(v);
for (auto _ : state) {
memmove(v + n0, v + n1, size);
benchmark::DoNotOptimize(v);
}
}
BENCHMARK(bm_memmove)->ArgsProduct({{8191, 8193}, {-5, +5}});
BENCHMARK_MAIN();Results on i5-1235U (Alder Lake)
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
bm_memmove/8191/-5 71.4 ns 71.5 ns 8960000
bm_memmove/8193/-5 71.1 ns 71.5 ns 8960000
bm_memmove/8191/5 62.6 ns 61.0 ns 8960000
bm_memmove/8193/5 1903 ns 1925 ns 373333
Results on i7-8750H (Coffee Lake)
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
bm_memmove/8191/-5 143 ns 141 ns 4977778
bm_memmove/8193/-5 145 ns 146 ns 4480000
bm_memmove/8191/5 77.2 ns 76.7 ns 8960000
bm_memmove/8193/5 80.9 ns 80.2 ns 8960000
Analysis
All I know or suspect so far:
- The problem exists on Alder Lake (Intel Core 12th gen) but does not exist on Coffee Lake (Intel Core 8th gen) or Skylake (Intel Core 6th gen)
- The problematic instruction is
rep movsb, which is used inmemmove - The problematic behavior is recreated for me when the size is greater than 8192 and the pointer difference is smaller than 64
- Clang on Linux is also affected, proved by recreating the issue here: https://quick-bench.com/q/HgY3kPAaUIqkfmzwz_NFeoTcj3U
I appreciate any help in investigating the issue further.
Ideally we'd need to report this issue to VCRuntime maintainers.
But I feel like we need to try to gather more information to report it better.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
externalThis issue is unrelated to the STLThis issue is unrelated to the STLresolvedSuccessfully resolved without a commitSuccessfully resolved without a commit