Skip to content
This repository was archived by the owner on Sep 21, 2023. It is now read-only.

Refactor monitoring to make the shipper more beats-like, and compatible with Agent#289

Merged
fearful-symmetry merged 13 commits intoelastic:mainfrom
fearful-symmetry:expvar-monitoring-changes
Mar 31, 2023
Merged

Refactor monitoring to make the shipper more beats-like, and compatible with Agent#289
fearful-symmetry merged 13 commits intoelastic:mainfrom
fearful-symmetry:expvar-monitoring-changes

Conversation

@fearful-symmetry
Copy link
Copy Markdown
Contributor

@fearful-symmetry fearful-symmetry commented Mar 27, 2023

What does this PR do?

Part of fix for #267

This accomplishes a few things:

  • changes the CLI and config layout to make the shipper's monitoring flags and config values interoperable with beats
  • Completely rewrites the shipper monitoring to use the registry code used by other beats
  • Adds a /shipper and /stats and /state endpoint to http monitoring
  • Adds a config flag http.expvar that re-exports expvar metrics
  • Fixes a bug where the shipper would not remove the gRPC endpoint on close
  • fixes a bug where the shipper wasn't reporting a diagnostics callback for queue metrics
  • Edits the .golangci.yml lint file to match that in beats, since we're running into issues with cgo again.

Note that this currently won't work with agent, as changes on the agent side are needed to make agent scrape metrics from the shipper.

Why is it important?

Needed to integrate the shipper with other agent monitoring

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md or CHANGELOG-developer.md.

Author's Checklist

How to test this PR locally

  • pull and build
  • Run the shipper in standalone mode with http.enabled: true
  • cURL (or whatever) the /, /shipper, /stats and /state endpoints, make sure they're reporting valid data.

@fearful-symmetry fearful-symmetry added the Team:Elastic-Agent Label for the Agent team label Mar 27, 2023
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner March 27, 2023 22:10
@fearful-symmetry fearful-symmetry self-assigned this Mar 27, 2023
@fearful-symmetry fearful-symmetry requested review from faec and ycombinator and removed request for a team March 27, 2023 22:10
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 27, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @fearful-symmetry? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 27, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-30T23:39:17.608+0000

  • Duration: 19 min 45 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Whoop, go linter doesn't like my go.mod replace line for elastic/elastic-agent-system-metrics#80

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Okay, just noticed a major problem with this while testing. The shipper is built on the assumption that any of its sub-components can be arbitrarily stopped and start, but the monitoring code used is mostly global, and will usually panic if you try to re-register or update something. Will start poking at a fix.

Copy link
Copy Markdown
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me (pending the segfaults in the tests which look pretty fixable), but I will hold off approving while you investigate the global state issues. (Considering how big this already is, it might make sense to check in a somewhat-brittle prototype first and do in-place refactors to make it more modular, as long as the failure mode is not too disastrous?)

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

fearful-symmetry commented Mar 28, 2023

@leehinman you've run into this before on windows, right?

error creating API for registry reporter: parse "unix://C:\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock": invalid port ":\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock" after host

@leehinman
Copy link
Copy Markdown
Contributor

@leehinman you've run into this before on windows, right?

error creating API for registry reporter: parse "unix://C:\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock": invalid port ":\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock" after host

yeah, that looks a lot like what was happening with the grpc socket. On windows you need to use a named pipe.

"github.com/elastic/elastic-agent-libs/api/npipe

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Alright, lets see what this does...

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

/test

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Kinda baffled by the CI error:

[2023-03-29T00:14:27.444Z]    ⨯ build failed after 84.06s error=failed to build for darwin_arm64: exit status 2: # github.com/elastic/elastic-agent-shipper/monitoring

[2023-03-29T00:14:27.444Z] monitoring/queuemon.go:216:9: undefined: report.SetupInfoUserMetrics

tried building locally on an ARM macbook, works fine.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Alright, can reproduce it with mage package on a Macbook:

   ⨯ build failed after 19.73s error=failed to build for darwin_arm64: exit status 2: # github.com/elastic/elastic-agent-shipper/monitoring
monitoring/queuemon.go:216:9: undefined: report.SetupInfoUserMetrics

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

My first thought is that it was a cgo issue, but if that was the case, it would have failed on SetupMetrics first, and not SetupInfoUserMetrics.

@ycombinator
Copy link
Copy Markdown

@fearful-symmetry Shot in the dark, but could the build tags here have something to do with the error?

https://github.com/elastic/elastic-agent-system-metrics/blob/bea3acfa41eca45dfb07d3a7624dfc04baa2012a/report/report.go#L18-L19

@ycombinator
Copy link
Copy Markdown

ycombinator commented Mar 29, 2023

My first thought is that it was a cgo issue, but if that was the case, it would have failed on SetupMetrics first, and not SetupInfoUserMetrics.

Not necessarily, since SetupMetrics is defined twice, once in setup.go and once in setup_other.go, each file having complementary build tags. However, SetupInfoUserMetrics is only defined in setup.go.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Not necessarily, since SetupMetrics is defined twice, once in setup.go and once in setup_other.go, each file having complementary build tags.

@ycombinator argh, I didn't even notice there was a setup_other.go

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Put in elastic/elastic-agent-system-metrics#82 since that code doesn't require cgo.

However, that raises another question, should mage package have cgo disabled?

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

/test

@fearful-symmetry fearful-symmetry requested a review from faec March 29, 2023 19:17
Copy link
Copy Markdown
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test!

@fearful-symmetry fearful-symmetry merged commit deef7b8 into elastic:main Mar 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Team:Elastic-Agent Label for the Agent team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants