Skip to content
This repository was archived by the owner on Sep 21, 2023. It is now read-only.

Add infinite retry, fix monitoring#296

Merged
fearful-symmetry merged 5 commits intoelastic:mainfrom
fearful-symmetry:shipper-retry
Apr 18, 2023
Merged

Add infinite retry, fix monitoring#296
fearful-symmetry merged 5 commits intoelastic:mainfrom
fearful-symmetry:shipper-retry

Conversation

@fearful-symmetry
Copy link
Copy Markdown
Contributor

What does this PR do?

Closes #262

This PR adds infinite retry functionality, and completely refactors the elasticsearch health monitor. In detail

  • If the max_retries is set to -1, this enables "infinite" retry, which just sets the ES MaxRetries to MaxInt.
  • Because many common error conditions will now retry instead of trickling down to OnFailure, the monitor instead watches the raw HTTP return codes coming from the elasticsearch transport, and checks the deltas.
  • The monitor will mark the ES output as unhealthy if zero successful HTTP codes are reported in a given period, and > 0 error conditions are reported across the reporting period

Why is it important?

We want infinite retry.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md or CHANGELOG-developer.md.

How to test this PR locally

  • Pull down and build
  • Run either standalone or with agent
  • Set max_retries to -1:
outputs:
  default:
    log_level: debug
    type: elasticsearch
    hosts: [https://127.0.0.1:9200]
    username: "elastic"
    password: "changeme"
    bulk_max_size: 4096
    max_retries: -1
    ssl:
      verification_mode: none
    shipper:
      log_level: debug
      enabled: true
      bulk_max_size: 2048
  • Ensure that events send properly
  • Break Elasticsearch
  • Run again, ensure that the shipper reports a degraded state

@fearful-symmetry fearful-symmetry added the Team:Elastic-Agent Label for the Agent team label Apr 13, 2023
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner April 13, 2023 21:57
@fearful-symmetry fearful-symmetry self-assigned this Apr 13, 2023
@fearful-symmetry fearful-symmetry requested review from belimawr and cmacknz and removed request for a team April 13, 2023 21:57
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 13, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @fearful-symmetry? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Apr 13, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-04-18T17:03:57.150+0000

  • Duration: 18 min 22 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@pierrehilbert pierrehilbert requested a review from leehinman April 17, 2023 07:45
Copy link
Copy Markdown
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few suggested cleanups

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Team:Elastic-Agent Label for the Agent team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shipper doesn't support infinite retry / robust failure handling

4 participants