Skip to content

Add refresh/merge/flush totals in summary#615

Merged
dliappis merged 3 commits intoelastic:masterfrom
dliappis:add-total-counts
Dec 14, 2018
Merged

Add refresh/merge/flush totals in summary#615
dliappis merged 3 commits intoelastic:masterfrom
dliappis:add-total-counts

Conversation

@dliappis
Copy link
Contributor

Due to #608 it's likely we need to benchmark scenarios without using
the node-stats telemetry device. At the same time we want to get a
general idea of how many refreshes/merges/flushes happened (in total)
by accessing the index stats.

Add total count for merges/refresh/flush in summary output; this is
collected from _all/primaries in _stats.

Also modify summary description to clarify the values are totals from
primary shards.

Finally fix bug where index stats where time/count == 0 got skipped
from the summary.

Closes #614

Due to elastic#608 it's likely we need to benchmark scenarios without using
the node-stats telemetry device. At the same time we want to get a
general idea of how many refreshes/merges/flushes happened (in total)
by accessing the index stats.

Add total count for merges/refresh/flush in summary output; this is
collected from `_all/primaries` in `_stats`.

Also modify summary description to clarify the values are totals from
primary shards.

Finally fix bug where index stats where time/count == 0 got skipped
from the summary.

Closes elastic#614
@dliappis dliappis added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics labels Dec 10, 2018
@dliappis dliappis added this to the 1.0.3 milestone Dec 10, 2018
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left a couple of comments and suggestions.

self.line("Total Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds),
self.line("Total Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds)
self.line("Total primaries Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds),
self.line("Total primaries Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct as GC times are gathered on node-level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thanks, will fix.

self.line("Total {}".format(name), baseline_total, contender_total, "", unit,
self.line("Total primaries {}".format(name), baseline_total, contender_total, "", unit,
treat_increase_as_improvement=False, formatter=convert.ms_to_minutes),
self.line("Min {} per shard".format(name), baseline_per_shard.get("min"), contender_per_shard.get("min"), "", unit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also say here: "... per primary shard"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there can be >1 primary shards, how about "... across primary shards"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "... of primary shards"?

self.line("Total primaries Young Gen GC", baseline_stats.young_gc_time, contender_stats.young_gc_time, "", "s",
treat_increase_as_improvement=False, formatter=convert.ms_to_seconds),
self.line("Total Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
self.line("Total primaries Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I don't think that the GC times should mention primaries.

doc = {
"name": name,
"value": value,
"unit": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should specify unit at all for a count?

counts = []
self.index_count(counts, stats, "merges_total_count", ["merges", "total"])
self.index_count(counts, stats, "refresh_total_count", ["refresh", "total"])
self.index_count(counts, stats, "flush_total_count", ["flush", "total"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these new metric keys should be documented in https://esrally.readthedocs.io/en/stable/metrics.html#metric-keys. Also, the corresponding docs for the summary report should be amended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address.


def report_total_counts(self, stats):
lines = []
lines.extend(self.report_total_count("merge count", stats.merge_count))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.

lines.extend(self.report_total_time("flush time",
baseline_stats.flush_time, baseline_stats.flush_time_per_shard,
contender_stats.flush_time, contender_stats.flush_time_per_shard))
lines.extend(self.report_total_count("merge count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.

And group total counts directly below total times in all summary reports.
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. LGTM

Use cumulative instead of total for summary reports and comparisons
and clarify that min/median/max are across primary shards.

Relates: elastic/elasticsearch#35594 (comment)
@dliappis
Copy link
Contributor Author

@danielmitterdorfer in 3d12203 I switched to the new "cumulative" terminology from "totals" and made the clarifications for "across primary shards" for min/median/max metrics; would be appreciated if you could take a quick look before merge.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it's really clearer now (albeit the command line report becoming really large by now). LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add total merges and refreshes in summary output

2 participants