Send RetryInfo on OTel Timeouts#4294
Send RetryInfo on OTel Timeouts#4294KarstenSchnitter merged 20 commits intoopensearch-project:mainfrom
Conversation
|
@dlvenable this PR shows how to add a |
@KarstenSchnitter , What do you think about this configuration? Here is a code example of a nested configuration: The actual implementation is SqsOptions which is another simple POJO class. |
|
@KarstenSchnitter , What do you have remaining to make this PR ready for review? We did discuss having it be configurable, but anything else to add? |
|
I am mostly lacking time to make the required changes 😉:
|
DataPrepper is sending `RESOURCE_EXHAUSTED` gRPC responses whenever a buffer is full or a circuit breaker is active. These statuses do not contain a retry info. In the OpenTelemetry protocol, this implies a non-retryable error, that will lead to message drops, e.g. in the OTel collector. To apply proper back pressure in these scenarios a retry info is added to the status. Signed-off-by: Karsten Schnitter <k.schnitter@sap.com>
Implementation of exponential backoff. Idea is to start with a minimum delay on the first time-out or circuit breaker activation. If the next such event happens within twice the last delay after the previous event, double the delay until a maximum delay is reached. Use the maximum delay from then on, until a sufficiently long period (maximum delay) without an event happens. Then the delay is reset to minimum. TODO: Make minimum and maximum delay configurable. Signed-off-by: Karsten Schnitter <k.schnitter@sap.com>
55b91da to
85a7a18
Compare
(cherry picked from commit 0d45f77) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit f8ac48e) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit ff675dc) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit 43ba7ee) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit 1f90615) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit 2977c1f) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit 473db0e) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit 6ef9b7e) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit 091e9a6) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
(cherry picked from commit a588b09) Signed-off-by: Tomas Longo <tomas.longo@sap.com>
Signed-off-by: Tomas Longo <tomas.longo@sap.com>
Add RetryInfo Configuration
|
I got help by Tomas Longo, who provided the missing configuration and tests. We also tested, that the RetryInfo is correctly picked up by the OpenTelemetry Collector. With this change Data Prepper exercises back-pressure if the circuit breakers are active. |
Signed-off-by: Tomas Longo <tomas.longo@sap.com>
Add java time module to tests
Signed-off-by: Tomas Longo <tomas.longo@sap.com>
fe555e6 to
b29246a
Compare
|
There was a slight issue with the initialisation of the |
dlvenable
left a comment
There was a problem hiding this comment.
Thank you @KarstenSchnitter , this should be a good improvement for OTel clients.
...t/java/org/opensearch/dataprepper/plugins/source/oteltrace/OTelTraceSourceRetryInfoTest.java
Outdated
Show resolved
Hide resolved
...ource/src/main/java/org/opensearch/dataprepper/plugins/source/oteltrace/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...ource/src/main/java/org/opensearch/dataprepper/plugins/source/oteltrace/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...rce/src/main/java/org/opensearch/dataprepper/plugins/source/otelmetrics/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...rce/src/main/java/org/opensearch/dataprepper/plugins/source/otelmetrics/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...source/src/main/java/org/opensearch/dataprepper/plugins/source/otellogs/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...source/src/main/java/org/opensearch/dataprepper/plugins/source/otellogs/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
Co-authored-by: David Venable <dlv@amazon.com> Signed-off-by: Karsten Schnitter <k.schnitter@sap.com>
Co-authored-by: David Venable <dlv@amazon.com> Signed-off-by: Karsten Schnitter <k.schnitter@sap.com>
Signed-off-by: Karsten Schnitter <k.schnitter@sap.com>
|
@dlvenable I renamed the tests. Can you have a look again. I think, that the build failures are caused by different components, not this changeset. |
dlvenable
left a comment
There was a problem hiding this comment.
Thank you @KarstenSchnitter !
Description
DataPrepper is sending
RESOURCE_EXHAUSTEDgRPC responses whenever a buffer is full or a circuit breaker is active. These statuses do not contain a retry info. In the OpenTelemetry protocol, this implies a non-retryable error, that will lead to message drops, e.g. in the OTel collector. To apply proper back pressure in these scenarios a retry info is added to the status.Issues Resolved
Resolves #4119
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.