Skip to content

CCR Read Exceptions stack monitoring alert #79990

@ravikesarwani

Description

@ravikesarwani

Add new out of box stack monitoring alert for CCR Read Exceptions:

Provide a way for the user to get alerted when CCR is enabled on the cluster and some indices are NOT getting replicated:
Condition

  • What to alert on: Is there a read exception in the latest CCR stats call indices.shards.read_exceptions
  • Alert re-notify interval => 6 hours
  • Group => Accumulate 1 alert for the cluster(include all indices/shards that have exceptions)

Schedule

  • Check done every 1 minute

Troubleshooting actions

  • Link them to the “CCR” tab is stack monitoring showing information such as the leader index, an indication of how much the follower index is lagging behind the leader index, the last fetch time, the number of operations synced, and error messages.
  • If you select a shard, you can see graphs for the fetch and operation delays. You can also see advanced information, which contains the results from the get follower stats API
  • Link them to stack management Manage cross-cluster replication
  • Inspect replication statistics in Stack Management side navigation Cross-Cluster Replication.
  • Manage auto-follow pattern

Helpful Notes:

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions