Skip to content

Add timeout for ccr recovery action#37840

Merged
Tim-Brooks merged 14 commits intoelastic:masterfrom
Tim-Brooks:individual_action_timeouts
Jan 29, 2019
Merged

Add timeout for ccr recovery action#37840
Tim-Brooks merged 14 commits intoelastic:masterfrom
Tim-Brooks:individual_action_timeouts

Conversation

@Tim-Brooks
Copy link
Copy Markdown
Contributor

This is related to #35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.

This is related to elastic#35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
@Tim-Brooks Tim-Brooks added >non-issue v7.0.0 :Distributed/CCR Issues around the Cross Cluster State Replication features v6.7.0 labels Jan 25, 2019
@Tim-Brooks Tim-Brooks requested a review from ywelsch January 25, 2019 00:24
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

@ywelsch

  1. What are you thoughts on the default value for this?
  2. I'm not sure how we want to pursue testing? It looks like we do not test the individual action timeouts for normal recovery. I could insert a wait or something using the CcrRestoreSourceService callbacks. But I had planned on removing those eventually as it seems weird to have callbacks dedicated for testing purposes.

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and default value looks good (we can always increase it later if we see the need for it). For testing, you could set a low timeout, and then inject a failure at an arbitrary point (i.e. randomly on one of the CCR recovery actions) using MockTransportService, see RecoveryActionBlocker as an example in IndexRecoveryIT, and then check that the restore fails in a timely fashion.

@Tim-Brooks Tim-Brooks requested a review from ywelsch January 26, 2019 06:42
Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

settingsRequest = new ClusterUpdateSettingsRequest();
TimeValue defaultValue = CcrSettings.INDICES_RECOVERY_ACTION_TIMEOUT_SETTING.getDefault(Settings.EMPTY);
settingsRequest.persistentSettings(Settings.builder().put(CcrSettings.INDICES_RECOVERY_ACTION_TIMEOUT_SETTING.getKey(),
defaultValue));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting it to null also resets to the default


public void testIndividualActionsTimeout() throws Exception {
ClusterUpdateSettingsRequest settingsRequest = new ClusterUpdateSettingsRequest();
TimeValue timeValue = new TimeValue(100);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer TimeValue.timeValueMillis(100). It's clearer what the 100 are.

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/2

@Tim-Brooks Tim-Brooks merged commit f3f9cab into elastic:master Jan 29, 2019
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Jan 30, 2019
This is related to elastic#35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
Tim-Brooks added a commit that referenced this pull request Jan 31, 2019
This is related to #35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
@Tim-Brooks Tim-Brooks deleted the individual_action_timeouts branch December 18, 2019 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/CCR Issues around the Cross Cluster State Replication features >non-issue v6.7.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants