Skip to content

Only log warning when actually failing shards#28558

Merged
dnhatn merged 4 commits intoelastic:masterfrom
dnhatn:log-failed-shard
Feb 8, 2018
Merged

Only log warning when actually failing shards#28558
dnhatn merged 4 commits intoelastic:masterfrom
dnhatn:log-failed-shard

Conversation

@dnhatn
Copy link
Copy Markdown
Member

@dnhatn dnhatn commented Feb 8, 2018

Currently the master node logs a warning message whenever it receives a failed shard request. However, this can be noisy because

  • Multiple failed shard requests can be issued for a single shard
  • Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing shard action actually happens. This is another prerequisite step in order to not ignore the shard not-available exceptions in the replication.

Relates #28534

if (failedShardEntry.markAsStale()) {
allocation.removeAllocationId(failedShard);
}
logger.warn(new ParameterizedMessage("failing shard [{}]", failedShardEntry), failedShardEntry.getFailure());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add logging information for the "mark as stale" part here as well as in removeStaleIdsWithoutRoutings?

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Feb 8, 2018

@ywelsch, I've added a logging for stale shards. The failedShardEntry already includes the mark as stale information. Please have another look. Thank you!

indexMetaDataBuilder.putInSyncAllocationIds(shardNumber, remainingInSyncAllocations);
}
// Only log the stale shards which have been actually removed.
if (oldInSyncAllocations.size() != remainingInSyncAllocations.size()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not necessary. idsToRemove already contains the info that's needed (it does not contain non-existing ids, you can even add an assertion for that if you like).
Also I would change the message to "{} marking unavailable shards as stale: {}"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b0cd954

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Feb 8, 2018

Thanks @ywelsch for reviewing.

@dnhatn dnhatn changed the title Only log warning of failing shard when actually failing it Only log warning when actually failing shards Feb 8, 2018
@dnhatn dnhatn merged commit 5b8870f into elastic:master Feb 8, 2018
@dnhatn dnhatn deleted the log-failed-shard branch February 8, 2018 20:37
dnhatn added a commit that referenced this pull request Feb 8, 2018
Currently the master node logs a warning message whenever it receives a 
failed shard request. However, this can be noisy because

- Multiple failed shard requests can be issued for a single shard
- Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing 
shard action actually happens. This is another prerequisite step in 
order to not ignore the shard not-available exceptions in the
replication.

Relates #28534
@lcawl lcawl added :Search/Search Search-related issues that do not fall into other categories and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Search/Search Search-related issues that do not fall into other categories labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement v6.3.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants