Skip to content

Improve slow logging in MasterService#45086

Merged
DaveCTurner merged 6 commits intoelastic:masterfrom
DaveCTurner:2019-08-01-tighter-timeout-warning-in-master-service
Aug 6, 2019
Merged

Improve slow logging in MasterService#45086
DaveCTurner merged 6 commits intoelastic:masterfrom
DaveCTurner:2019-08-01-tighter-timeout-warning-in-master-service

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

Adds a tighter threshold for logging a warning about slowness in the
MasterService instead of relying on the cluster service's 30-second warning
threshold. This new threshold applies to the computation of the cluster state
update in isolation, so we get a warning if computing a new cluster state
update takes longer than 10 seconds even if it is subsequently applied quickly.
It also applies independently to the length of time it takes to notify the
cluster state tasks on completion of publication, in case any of these
notifications holds up the master thread for too long.

Relates #45007

Adds a tighter threshold for logging a warning about slowness in the
`MasterService` instead of relying on the cluster service's 30-second warning
threshold. This new threshold applies to the computation of the cluster state
update in isolation, so we get a warning if computing a new cluster state
update takes longer than 10 seconds even if it is subsequently applied quickly.
It also applies independently to the length of time it takes to notify the
cluster state tasks on completion of publication, in case any of these
notifications holds up the master thread for too long.

Relates elastic#45007
@DaveCTurner DaveCTurner added >enhancement :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.4.0 labels Aug 1, 2019
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Copy Markdown
Member Author

@elasticmachine please run elasticsearch-ci/2 (unrelated watcher failure it seems)

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments, looking good o.w.

taskInputs.summary,
previousClusterState.nodes(),
previousClusterState.routingTable(),
previousClusterState.getRoutingNodes()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call is potentially expensive (as it lazily builds the routing nodes), so I wonder if we should still guard this logging call with logger.isTraceEnabled()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think passing a message supplier is enough to avoid that, see 45c1919.

@DaveCTurner DaveCTurner requested a review from ywelsch August 6, 2019 08:46
@DaveCTurner
Copy link
Copy Markdown
Member Author

Failures look infrastructural. You do you, Jenkins.

@elasticmachine please run elasticsearch-ci/bwc
@elasticmachine please run elasticsearch-ci/oss-distro-docs

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 6143ebf into elastic:master Aug 6, 2019
@DaveCTurner DaveCTurner deleted the 2019-08-01-tighter-timeout-warning-in-master-service branch August 6, 2019 14:09
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Aug 6, 2019
Adds a tighter threshold for logging a warning about slowness in the
`MasterService` instead of relying on the cluster service's 30-second warning
threshold. This new threshold applies to the computation of the cluster state
update in isolation, so we get a warning if computing a new cluster state
update takes longer than 10 seconds even if it is subsequently applied quickly.
It also applies independently to the length of time it takes to notify the
cluster state tasks on completion of publication, in case any of these
notifications holds up the master thread for too long.

Relates elastic#45007
DaveCTurner added a commit that referenced this pull request Aug 6, 2019
Adds a tighter threshold for logging a warning about slowness in the
`MasterService` instead of relying on the cluster service's 30-second warning
threshold. This new threshold applies to the computation of the cluster state
update in isolation, so we get a warning if computing a new cluster state
update takes longer than 10 seconds even if it is subsequently applied quickly.
It also applies independently to the length of time it takes to notify the
cluster state tasks on completion of publication, in case any of these
notifications holds up the master thread for too long.

Relates #45007
Backport of #45086
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.4.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants