with 45 threads, we see that summarizing the metrics, specifically for the the percentiles per second, takes a very large ammount of cpu cycles (in the bellow case > 80%).
./memtier_benchmark -p 12000 --hide-histogram --pipeline=10 -c 168 -t 45 -d 1000 --key-maximum=9625503 --key-pattern=R:R --ratio=1:0 --test-time=1200 --run-count=2
profile detail:
