Introduce long polling for changes#33683
Conversation
Rather than scheduling pings to the leader index when we are caught up to the leader, this commit introduces long polling for changes. We will fire off a request to the leader which if we are already caught up will enter a poll on the leader side to listen for global checkpoint changes. These polls will timeout after a default of one minute, but can also be specified when creating the following task. We use these time outs as a way to keep statistics up to date, to not exaggerate time since last fetches, and to avoid pipes being broken.
|
Pinging @elastic/es-distributed |
|
@elasticmachine run gradle build tests |
martijnvg
left a comment
There was a problem hiding this comment.
This looks good! I left two questions about ShardChangesAction.TransportAction#asyncShardOperation(...).
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/ShardChangesAction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/ShardChangesAction.java
Show resolved
Hide resolved
…polling * elastic/master: SQL: Return correct catalog separator in JDBC (elastic#33670) [CCR] Add validation for max_retry_delay (elastic#33648) [CCR] Add monitoring mapping verification test (elastic#33662) CORE: Disable Setting Type Validation (elastic#33660) (elastic#33669) Revert "Use serializable exception in GCP listeners (elastic#33657)" Adding index refresh (elastic#33647) [DOCS] Moves securing-communications to docs (elastic#33640) [HLRC][ML] Add ML delete datafeed API to HLRC (elastic#33667) Mute testRecoveryWithConcurrentIndexing TEST: decrease logging level in the flush test DOC: Add SQL section on client applications Fix race in global checkpoint listeners test Use serializable exception in GCP listeners (elastic#33657)
|
@martijnvg I pushed some commits. |
dnhatn
left a comment
There was a problem hiding this comment.
I did not know asyncShardOperation before. This is really nice.
| final SeqNoStats seqNoStats = indexShard.seqNoStats(); | ||
|
|
||
| if (request.getFromSeqNo() > seqNoStats.getGlobalCheckpoint()) { | ||
| assert request.getFromSeqNo() == 1 + seqNoStats.getGlobalCheckpoint(); |
There was a problem hiding this comment.
I am not sure if this assertion always holds.
|
@elasticmachine run gradle build tests |
* master: (24 commits) Only notify ready global checkpoint listeners (elastic#33690) Don't count hits via the collector if the hit count can be computed from index stats. (elastic#33701) Expose retries for CCR fetch failures (elastic#33694) Test fix - Graph vertices could appear in different orders based on map insertion sequence (elastic#33709) Structured audit logging (elastic#31931) Core: Add DateFormatter interface for java time parsing (elastic#33467) [CCR] Check whether the rejected execution exception has the shutdown flag set (elastic#33703) Mute ClusterDisruptionIT#testSendingShardFailure Revert "Mute FullClusterRestartSettingsUpgradeIT" Adjust BWC version on settings upgrade test (elastic#33650) [ML] Allow overrides for some file structure detection decisions (elastic#33630) Adapt skip version for doc_values format deprecation [TEST] wait for no initializing shards [Docs] Minor fix in `has_child` javadoc comment (elastic#33674) Mute FullClusterRestartSettingsUpgradeIT [Kerberos] Add realm name & UPN to user metadata (elastic#33338) [TESTS] Disable specific locales for RestrictedTrustManagerTest (elastic#33299) SQL: Return functions in JDBC driver metadata (elastic#33672) SCRIPTING: Move terms_set Context to its Own Class (elastic#33602) AwaitsFix testRestoreMinmal ...
|
This job triggered CI during a migration of the master. Kicking off an additional build for you manually... Jenkins, test this please. |
|
@elasticmachine run gradle build tests |
* master: Add script to cache dependencies (elastic#33726) [DOCS] Moves security reference to docs folder (elastic#33643) Cleanup assertions in global checkpoint listeners (elastic#33722) [CCR] Move ccr tests in core module back to ccr module (elastic#33711) HLRC: ML PUT Calendar (elastic#33362) [Tests] Fix randomization in StringTermsIT (elastic#33678) Pin TLS1.2 in SSLConfigurationReloaderTests
|
To fix the build failures here, I want to integrate #33731 and then I want to add a check to the pending tasks check in the REST test to skip the shard changes action tasks that can be outstanding from polls. 😇 |
* master: Move CCR REST tests to a sub-project of ccr Move CCR REST tests to ccr sub-project (elastic#33731) Move CCR monitoring tests to ccr sub-project (elastic#33730)
|
This pull request requires #33738 and then it can be merged. |
* master: Do not count shard changes tasks against REST tests (elastic#33738) [HLRC][ML] Add ML get datafeed API to HLRC (elastic#33715)
Rather than scheduling pings to the leader index when we are caught up to the leader, this commit introduces long polling for changes. We will fire off a request to the leader which if we are already caught up will enter a poll on the leader side to listen for global checkpoint changes. These polls will timeout after a default of one minute, but can also be specified when creating the following task. We use these time outs as a way to keep statistics up to date, to not exaggerate time since last fetches, and to avoid pipes being broken.
Rather than scheduling pings to the leader index when we are caught up to the leader, this commit introduces long polling for changes. We will fire off a request to the leader which if we are already caught up will enter a poll on the leader side to listen for global checkpoint changes. These polls will timeout after a default of one minute, but can also be specified when creating the following task. We use these time outs as a way to keep statistics up to date, to not exaggerate time since last fetches, and to avoid pipes being broken.
Relates #32651