Add refresh/merge/flush totals in summary#615
Conversation
Due to elastic#608 it's likely we need to benchmark scenarios without using the node-stats telemetry device. At the same time we want to get a general idea of how many refreshes/merges/flushes happened (in total) by accessing the index stats. Add total count for merges/refresh/flush in summary output; this is collected from `_all/primaries` in `_stats`. Also modify summary description to clarify the values are totals from primary shards. Finally fix bug where index stats where time/count == 0 got skipped from the summary. Closes elastic#614
danielmitterdorfer
left a comment
There was a problem hiding this comment.
Thanks for the PR. I left a couple of comments and suggestions.
esrally/reporter.py
Outdated
| self.line("Total Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds), | ||
| self.line("Total Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds) | ||
| self.line("Total primaries Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds), | ||
| self.line("Total primaries Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds) |
There was a problem hiding this comment.
I don't think this is correct as GC times are gathered on node-level.
There was a problem hiding this comment.
Good catch thanks, will fix.
esrally/reporter.py
Outdated
| self.line("Total {}".format(name), baseline_total, contender_total, "", unit, | ||
| self.line("Total primaries {}".format(name), baseline_total, contender_total, "", unit, | ||
| treat_increase_as_improvement=False, formatter=convert.ms_to_minutes), | ||
| self.line("Min {} per shard".format(name), baseline_per_shard.get("min"), contender_per_shard.get("min"), "", unit, |
There was a problem hiding this comment.
Should also say here: "... per primary shard"?
There was a problem hiding this comment.
As there can be >1 primary shards, how about "... across primary shards"?
There was a problem hiding this comment.
or "... of primary shards"?
esrally/reporter.py
Outdated
| self.line("Total primaries Young Gen GC", baseline_stats.young_gc_time, contender_stats.young_gc_time, "", "s", | ||
| treat_increase_as_improvement=False, formatter=convert.ms_to_seconds), | ||
| self.line("Total Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s", | ||
| self.line("Total primaries Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s", |
There was a problem hiding this comment.
Similarly, I don't think that the GC times should mention primaries.
esrally/mechanic/telemetry.py
Outdated
| doc = { | ||
| "name": name, | ||
| "value": value, | ||
| "unit": "", |
There was a problem hiding this comment.
I don't think we should specify unit at all for a count?
| counts = [] | ||
| self.index_count(counts, stats, "merges_total_count", ["merges", "total"]) | ||
| self.index_count(counts, stats, "refresh_total_count", ["refresh", "total"]) | ||
| self.index_count(counts, stats, "flush_total_count", ["flush", "total"]) |
There was a problem hiding this comment.
I think these new metric keys should be documented in https://esrally.readthedocs.io/en/stable/metrics.html#metric-keys. Also, the corresponding docs for the summary report should be amended.
esrally/reporter.py
Outdated
|
|
||
| def report_total_counts(self, stats): | ||
| lines = [] | ||
| lines.extend(self.report_total_count("merge count", stats.merge_count)) |
There was a problem hiding this comment.
I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.
esrally/reporter.py
Outdated
| lines.extend(self.report_total_time("flush time", | ||
| baseline_stats.flush_time, baseline_stats.flush_time_per_shard, | ||
| contender_stats.flush_time, contender_stats.flush_time_per_shard)) | ||
| lines.extend(self.report_total_count("merge count", |
There was a problem hiding this comment.
I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.
And group total counts directly below total times in all summary reports.
danielmitterdorfer
left a comment
There was a problem hiding this comment.
Thanks for iterating. LGTM
Use cumulative instead of total for summary reports and comparisons and clarify that min/median/max are across primary shards. Relates: elastic/elasticsearch#35594 (comment)
|
@danielmitterdorfer in 3d12203 I switched to the new "cumulative" terminology from "totals" and made the clarifications for "across primary shards" for min/median/max metrics; would be appreciated if you could take a quick look before merge. |
danielmitterdorfer
left a comment
There was a problem hiding this comment.
Thanks, I think it's really clearer now (albeit the command line report becoming really large by now). LGTM!
Due to #608 it's likely we need to benchmark scenarios without using
the node-stats telemetry device. At the same time we want to get a
general idea of how many refreshes/merges/flushes happened (in total)
by accessing the index stats.
Add total count for merges/refresh/flush in summary output; this is
collected from
_all/primariesin_stats.Also modify summary description to clarify the values are totals from
primary shards.
Finally fix bug where index stats where time/count == 0 got skipped
from the summary.
Closes #614