[CCR] Add time since last auto follow fetch to auto follow stats#36542
Conversation
For each remote cluster the auto follow coordinator, starts an auto follower that checks the remote cluster state and determines whether an index needs to be auto followed. The time since last auto follow is reported per remote cluster and gives insight whether the auto follow process is alive. Relates to elastic#33007 Originates from elastic#35895
|
Pinging @elastic/es-distributed |
| private final long numberOfFailedRemoteClusterStateRequests; | ||
| private final long numberOfSuccessfulFollowIndices; | ||
| private final NavigableMap<String, ElasticsearchException> recentAutoFollowErrors; | ||
| private final NavigableMap<String, Long> trackingRemoteClusters; |
There was a problem hiding this comment.
I wonder whether all stats should be reported per remote cluster, so numberOfSuccessfulFollowIndices, recentAutoFollowErrors etc. would then be reported on a per remote cluster bases. This makes more sense to me now considering how the auto follow coordinator currently operates.
dnhatn
left a comment
There was a problem hiding this comment.
Thanks @martijnvg. This looks good. I left some comments.
| return Arrays.asList( | ||
| ccrLicenseChecker, | ||
| new AutoFollowCoordinator(client, clusterService, ccrLicenseChecker) | ||
| new AutoFollowCoordinator(client, clusterService, ccrLicenseChecker, System::nanoTime) |
There was a problem hiding this comment.
Maybe use ThreadPool::relativeTimeInMillis.
| private final Supplier<ClusterState> followerClusterStateSupplier; | ||
| private final LongSupplier relativeTimeProvider; | ||
|
|
||
| private volatile long lastAutoFollowTime = -1; |
There was a problem hiding this comment.
Can we add the time unit to the variable names (relativeTimeProvider and lastAutoFollowTime)?
| long numberOfFailedRemoteClusterStateRequests, | ||
| long numberOfSuccessfulFollowIndices, | ||
| NavigableMap<String, ElasticsearchException> recentAutoFollowErrors, | ||
| NavigableMap<String, Long> trackingRemoteClusters |
There was a problem hiding this comment.
Not sure about this parameter name (trackingRemoteClusters).
There was a problem hiding this comment.
Agreed, current name is not so good. I'm thinking about auto_followed_cluster and also including the last_seen_metadata version together with time_since_last_auto_follow_millis.
fixed monitor mapping tests
…_auto_follow_fetch
|
@dnhatn Thanks for look and I've updated the PR. |
dnhatn
left a comment
There was a problem hiding this comment.
I left some comments around the time unit. Feel free to merge it after addressing these.
| ClusterService clusterService, | ||
| CcrLicenseChecker ccrLicenseChecker) { | ||
| CcrLicenseChecker ccrLicenseChecker, | ||
| LongSupplier relativeNanoTimeProvider) { |
There was a problem hiding this comment.
Yikes, I completely missed that. Good catch!
| long lastSeenMetadataVersion = entry.getValue().metadataVersion; | ||
| if (lastAutoFollowTimeInNanos != -1) { | ||
| long timeSinceLastAutoFollowInMillis = | ||
| TimeUnit.NANOSECONDS.toMillis(relativeNanoTimeProvider.getAsLong() - lastAutoFollowTimeInNanos); |
There was a problem hiding this comment.
Since the time unit is ms, we should remove this conversion.
| "cluster_name": { | ||
| "type": "keyword" | ||
| }, | ||
| "time_since_last_auto_follow_started_millis": { |
There was a problem hiding this comment.
maybe drop started in the field name?
| private final Supplier<ClusterState> followerClusterStateSupplier; | ||
| private final LongSupplier relativeTimeProvider; | ||
|
|
||
| private volatile long lastAutoFollowTimeInNanos = -1; |
|
run the gradle build tests 1 |
…_auto_follow_fetch
|
I wonder if we should call this |
👍 I will rename it to |
) For each remote cluster the auto follow coordinator, starts an auto follower that checks the remote cluster state and determines whether an index needs to be auto followed. The time since last auto follow is reported per remote cluster and gives insight whether the auto follow process is alive. Relates to #33007 Originates from #35895
* master: (30 commits) Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (elastic#35320)" Deprecate types in get_source and exist_source (elastic#36426) Fix duplicate phrase in shrink/split error message (elastic#36734) ingest: support default pipelines + bulk upserts (elastic#36618) TESTS:Debug Log. IndexStatsIT#testFilterCacheStats [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (elastic#35320) [TEST] fix float comparison in RandomObjects#getExpectedParsedValue Initialize startup `CcrRepositories` (elastic#36730) ingest: fix on_failure with Drop processor (elastic#36686) SNAPSHOTS: Adjust BwC Versions in Restore Logic (elastic#36718) [Painless] Add boxed type to boxed type casts for method/return (elastic#36571) Do not resolve addresses in remote connection info (elastic#36671) Add back one line removed by mistake regarding java version check and COMPAT jvm parameter existence Fixing line length for EnvironmentTests and RecoveryTests (elastic#36657) SQL: Fix translation of LIKE/RLIKE keywords (elastic#36672) [DOCS] Adds monitoring requirement for ingest node (elastic#36665) SNAPSHOTS: Disable BwC Tests Until elastic#36659 Landed (elastic#36709) Add doc's sequence number + primary term to GetResult and use it for updates (elastic#36680) [CCR] Add time since last auto follow fetch to auto follow stats (elastic#36542) Watcher accounts constructed lazily (elastic#36656) ...
|
Would it be possible to update the documentation? https://www.elastic.co/guide/en/elasticsearch/reference/6.7/ccr-get-stats.html |
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.
Relates to #33007
Originates from #35895