Skip to content

[8.0](backport #29413) [Heartbeat] Defer monitor / ICMP errors to monitor runtime / ES#29892

Merged
andrewvc merged 2 commits into8.0from
mergify/bp/8.0/pr-29413
Jan 19, 2022
Merged

[8.0](backport #29413) [Heartbeat] Defer monitor / ICMP errors to monitor runtime / ES#29892
andrewvc merged 2 commits into8.0from
mergify/bp/8.0/pr-29413

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Jan 18, 2022

This is an automatic backport of pull request #29413 done by Mergify as well as #29900


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

This PR generally improves the error behavior of all monitors, and some specific ICMP related errors as well. These two items are combined in one PR because the general theme here is improving the ICMP error experience, and improving ICMP required improving all monitors.

Fixes #29346
and incremental progress toward #29692

General monitor improvements
Generally speaking, per #29692 we are trying to send monitor output to ES wherever possible. With this PR we now send any monitor initialization errors (such as a lack of ICMP kernel capabilities) during monitor creation to ES. We do this by allowing the monitor to initialize and run on schedule, even though we know it will always send the same error message. This lets users more easily debug issues in Kibana.

ICMP Specific Improvement
This PR also Removes broken a IP capability check that caused heartbeat to be unable to start. We now just rely on return codes from attempts to actually send packets. This is the more specific fix for #29346 . I was not able to exactly reproduce the exact customer reported issue, where the user somehow disabled ipv6 in a way that the ICMP loop that I can't exactly reproduce. I tried disabling ipv6 fully with sudo sysctl net.ipv6.conf.all.disable_ipv6=1 but that didn't yield the error in #29346

The logic is now simplified, there's no truly reliable way to know if you can send an ipv6 (or ipv4) ping before you send it (settings can change at any time! network cards can disappear!), so we just let the error codes happen as the check is executed. This is also generally a better UX in that the errors will now be visible in the Uptime app, not just the logs.

It should be noted that the ipv4 and ipv6 boolean options only are documented to affect how DNS lookups happen. With this change the behavior matches the docs.

Note that ICMP is a bit weird in that there's a single ICMP loop in heartbeat, and all monitors are really just interacting with that.

Removal of .synthetics
This also ignores the .synthetics folder which has been inconvenient for some time for devs, in that it dirties the git path

(cherry picked from commit 616db13)
@mergify mergify bot requested a review from a team as a code owner January 18, 2022 15:11
@mergify mergify bot added the backport label Jan 18, 2022
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 18, 2022
@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Jan 18, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-01-18T22:22:50.310+0000

  • Duration: 67 min 20 sec

  • Commit: 51a6275

Test stats 🧪

Test Results
Failed 0
Passed 3295
Skipped 71
Total 3366

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewvc andrewvc added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Jan 18, 2022
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/uptime (Team:Uptime)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 18, 2022
Fixes broken macos python e2e test
@andrewvc andrewvc merged commit f3cd790 into 8.0 Jan 19, 2022
@andrewvc andrewvc deleted the mergify/bp/8.0/pr-29413 branch January 19, 2022 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants