Make CWL retry indefinitely for retryable errors when no DLQ configured#6355
Make CWL retry indefinitely for retryable errors when no DLQ configured#6355kkondaka merged 2 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
| public static final String CLOUDWATCH_LOGS_EVENTS_SUCCEEDED = "cloudWatchLogsEventsSucceeded"; | ||
| public static final String CLOUDWATCH_LOGS_EVENTS_FAILED = "cloudWatchLogsEventsFailed"; | ||
| public static final String CLOUDWATCH_LOGS_REQUESTS_FAILED = "cloudWatchLogsRequestsFailed"; | ||
| public static final String CLOUDWATCH_LOGS_REQUEST_MULTI_FAILED = "cloudWatchLogsRequestMultipleFailures"; |
There was a problem hiding this comment.
Are you trying to track the requests that retry more than one time? If so is this name more clear
cloudWatchLogsRequestsWithMultipleRetries
There was a problem hiding this comment.
Another metric for multiple failures so that alarms can be setup on multiple failures. As per cloudwatch logs team, single failure may be OK but multiple failures is serious enough to be alarmed and immediate action is needed.
There was a problem hiding this comment.
What are "multiple failures" exactly? Different error responses for the same request on retries?
| .logGroup(cloudWatchLogsSinkConfig.getLogGroup()) | ||
| .logStream(cloudWatchLogsSinkConfig.getLogStream()) | ||
| .retryCount(cloudWatchLogsSinkConfig.getMaxRetries()) | ||
| .retryCount(dlqPushHandler == null ? Integer.MAX_VALUE : cloudWatchLogsSinkConfig.getMaxRetries()) |
There was a problem hiding this comment.
Do we want to retry all error infinitely? Should some be non-retryable?
There was a problem hiding this comment.
This is applicable only for retryable errors (CloudWatchLogsException OR SdkClientException) . Non-retryable errors (all other exceptions) are never retried.
…ed (opensearch-project#6355) * Make CWL retry indefinitely for retryable errors when no DLQ configured Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Added tests Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> Signed-off-by: Nathan Wand <wandna@amazon.com>
…ed (opensearch-project#6355) * Make CWL retry indefinitely for retryable errors when no DLQ configured Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Added tests Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> Signed-off-by: Simon ELBAZ <elbazsimon9@gmail.com>
…ed (opensearch-project#6355) * Make CWL retry indefinitely for retryable errors when no DLQ configured Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Added tests Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
…ed (opensearch-project#6355) * Make CWL retry indefinitely for retryable errors when no DLQ configured Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Added tests Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Description
Make CWL retry indefinitely for retryable errors when no DLQ configured.
Also, added a new metric to count multiple failures (every 5 failures)
Issues Resolved
Resolves #6300
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.