-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Adjust dynamic timeout for get_segment_files operation to prevent request timeouts #4392
Copy link
Copy link
Closed
Description
GetSegmentFiles transport request times out during requests with the current timeout of 1 minute from the recovery setting - indices.recovery.internal_action_retry_timeout.
To come up with a better timeout option, we can set it dynamically according to the total file segment size (from FileStoreMetadata) and the cluster's network bandwidth.
Without having access to knowledge of the cluster's network bandwidth, we can experiment to set a value of timeout that takes into account segment files' size.
Caused by: org.opensearch.transport.ReceiveTimeoutTransportException: [seed][10.9.0.166:9300][internal:index/shard/replication/get_segment_files] request_id [552738] timed out after [599988ms]
Failure stack trace from benchmarking
2022-09-02T09:34:08,220][ERROR][o.o.i.r.SegmentReplicationTargetService] [data-e20223d0] replication failure
org.opensearch.OpenSearchException: Segment Replication failed
at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onFailure(SegmentReplicationTargetService.java:293) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) [opensearch-2.2.0.jar:2.2.0] at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:178) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:149) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.StepListener.innerOnFailure(StepListener.java:82) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.NotifyOnceListener.onFailure(NotifyOnceListener.java:62) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener$4.onFailure(ActionListener.java:190) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener$6.onFailure(ActionListener.java:309) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFinalFailure(RetryableAction.java:201) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFailure(RetryableAction.java:193) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1379) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1270) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.transport.ReceiveTimeoutTransportException: [seed][10.9.0.166:9300][internal:index/shard/replication/get_segment_files] request_id [552738] timed out after [599988ms]
at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1273) ~[opensearch-2.2.0.jar:2.2.0]
... 4 more
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels