Add timing stats to publication process#76771
Conversation
This commit introduces into the node stats API various statistics to track the time that the elected master spends in various phases of the cluster state publication process. Relates elastic#76625
DaveCTurner
left a comment
There was a problem hiding this comment.
The changes to Coordinator and MasterService are pretty much the only content here, everything else is just plumbing and other boilerplate. Adding 19 statistics in one go generates a lot of noise.
| - is_false: nodes.$master.roles | ||
|
|
||
| --- | ||
| "Master timing stats": |
There was a problem hiding this comment.
Really just asserting that the stats really do come out of the API and that we do the right thing with ?human.
| return ifSet(masterApplyElapsedMillis); | ||
| } | ||
|
|
||
| private static long ifSet(long millis) { |
There was a problem hiding this comment.
Used by the maybeGet functions for failed-publication stats since the failure could happen at any stage.
| } | ||
| } | ||
|
|
||
| private static class TimingStatisticsTracker { |
There was a problem hiding this comment.
All methods synchronized (but very fast) just to avoid getting partially-updated stats.
|
Pinging @elastic/es-distributed (Team:Distributed) |
dakrone
left a comment
There was a problem hiding this comment.
This generally looks good, thanks for working on this @DaveCTurner! My one concern however is using the CachedTimeThread for calculating the times rather than raw System.nanoTime(). I think if we have a bunch of sub-200ms operations they'll appear to have completed instantaneously while the count is increasing, unless I am mistaken in the implementation. This inaccuracy seems like it will be very confusing for a user (and not to mention trying to graph/track stats like these).
|
|
||
| `success`:: | ||
| (object) | ||
| Contains statistics about cluster state update attempts that did not change the |
There was a problem hiding this comment.
I think these are update attempts that did change the cluster state successfully?
There was a problem hiding this comment.
Bah yes. Copy/paste might have been involved a bit here.
| */ | ||
| public class ClusterStatePublicationEvent { | ||
|
|
||
| private static final long NOT_SET = -1L; |
There was a problem hiding this comment.
Out of curiosity, why not just use 0 for NOT_SET instead of having a separate flag for it, especially since when retrieving the stats we adjust NOT_SET back to 0 anyway?
There was a problem hiding this comment.
Really just to support the assertion that these things are set once and only once on each event. I added a comment in bd8ef0c.
| final PublicationTransportHandler.PublicationContext publicationContext = | ||
| publicationHandler.newPublicationContext(clusterStatePublicationEvent); | ||
| clusterStatePublicationEvent.setPublicationContextConstructionElapsedMillis( | ||
| transportService.getThreadPool().relativeTimeInMillis() - publicationContextConstructionStartMillis); |
There was a problem hiding this comment.
Is there a concern that if this is < 200ms that it won't be counted with the use of the CachedTimeThread that only updates the clock every 200ms? Should this use "real" time instead?
There was a problem hiding this comment.
My thinking was that if these things are only taking 200ms or so then we probably don't care, or more precisely we only care if they're happening with high frequency and in the high frequency case we will pick up the occasional 200ms clock tick with a probability that matches the mean duration, so on balance it'll all work out.
Also it's a pain to write tests that assert things about time when using System::nanoTime. I can fix up the assertions that say that at least T ms elapsed, but I'd just have to drop the ones with an upper bound. Maybe it's not that bad, the only one that I can't easily add an artificial delay to is the publication context construction.
I'll try out some ideas next week. Do you think we should accumulate the timers in nanoseconds too? Long.MAX_VALUE nanoseconds is ~300 years, I don't think there's much risk of overflow.
There was a problem hiding this comment.
Ok in 9b0c5f3 I added a method to expose the raw (uncached) relative time to ThreadPool which we can still override to solve the testing pain, but which has better granularity. I stuck with milliseconds.
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
I would also prefer a more granular time measurement.
| `commit_time`:: | ||
| (<<time-units,time value>>) | ||
| The cumulative amount of time spent waiting for a successful cluster state | ||
| update to _commit_, which happens when a majority of the master nodes have | ||
| written the state to disk. | ||
|
|
||
| `completion_time`:: | ||
| (<<time-units,time value>>) | ||
| The cumulative amount of time spent waiting for a successful cluster state | ||
| update to _complete_, which happens when all the nodes have applied the cluster | ||
| state except for the elected master. |
There was a problem hiding this comment.
Perhaps clarify that these are from publication start, i.e., IIUC includes publication_time?
dakrone
left a comment
There was a problem hiding this comment.
LGTM, thanks for using the raw time!
|
Thanks both! |
This commit introduces into the node stats API various statistics to track the time that the elected master spends in various phases of the cluster state publication process. Relates #76625
Adjusts the wire compatibility constraints and reverts commit 18e657a.
* master: (21 commits) [Test] More robust assertions for sorting and pagination (elastic#76654) [Test] Fix filename check on Windows (elastic#76807) Upgrade build scan plugin to 3.6.4 (elastic#76784) Remove keystore initial_md5sum (elastic#76835) Don't export docker images on assemble (elastic#76817) Fix testMasterStatsOnSuccessfulUpdate (elastic#76844) AwaitsFix for elastic#76840 Make Releasing Aggregation Buffers Safer (elastic#76741) Re-enable BWC tests after backport of elastic#76771 (elastic#76839) Dispatch large bulk requests to write thread (elastic#76736) Disable BWC tests for elastic#76771 Pull down beats artifacts when performing release tests Add timing stats to publication process (elastic#76771) Fix BanFailureLoggingTests some more (elastic#76668) Mention "warn threshold" in master service slowlog (elastic#76815) Fix DockerTests.test010Install Re-enable tests affected by elastic#75097 (elastic#76814) Fix testRecoveryIsCancelledAfterDeletingTheIndex (elastic#76644) Test fix -WildcardFieldMapperTests bad test data. (elastic#76819) Updating supported version after backporting the feature (elastic#76794) ... # Conflicts: # server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.
Relates #76625