Skip to content

Cherry-pick #27242 to 7.x: Add support for CgroupsV2 in beats, migrate away from gosigar#27346

Merged
fearful-symmetry merged 1 commit intoelastic:7.xfrom
fearful-symmetry:backport_27242_7.x
Aug 12, 2021
Merged

Cherry-pick #27242 to 7.x: Add support for CgroupsV2 in beats, migrate away from gosigar#27346
fearful-symmetry merged 1 commit intoelastic:7.xfrom
fearful-symmetry:backport_27242_7.x

Conversation

@fearful-symmetry
Copy link
Copy Markdown
Contributor

@fearful-symmetry fearful-symmetry commented Aug 12, 2021

Cherry-pick of PR #27242 to 7.x branch. Original message:

What does this PR do?

This PR accomplishes a few different things:

  • Adds support for cgroupsV2
  • Continues our deprecation of elastic/gosigar, by moving the cgroups code into libbeat. A considerable portion of the code here is pre-existing code from gosigar that's been reorganized.
  • Moves the opt library from metricbeat/internal/metrics to libbeat, since we use it here too.
  • Refactors the existing cgroups V1 code to be more in line with the refactor of the system module, removing the MapStr manipulation code in favor of having the data format hard-coded into the structs, and makes the cgroups code overall more "metrics first"

Note that supporting cgroups V1 and V2 involves some compromises, as the two versions structure themselves and report data fairly differently, and as such, most of the metrics APIs require the consumer to differentiate between V1 and V2 if they want to access raw metrics. V1 and V2 can also cooexist on the same system, so this must happen on a pid-by-pid basis.

Also, I'm still testing this on V1 and mixed v1/v2 systems, but the code is otherwise ready for review.

There's also the issue of dashboards. The system Docker dashboard relies on many of the V1 fields that aren't present in V2, and I'm not really sure how to deal with them. Last I tried, we don't really have any mechanism for dashboards to operate with the logic of "use this field if present, otherwise use this other field." We may also just want to re-write the dashboard entirely to just use fields that are present in V1 and V2, at the expense of losing some of the visualizations.

Also keep in mind that most of system/process is going to be aggressively refactored after this, so any issues, unless they're serious, with the code in libbeat/metric/system/process and metricbeat/system/process will almost certainly be fixed as part of 7.16.

What's up with cgroupsV2

Cgroups V2 introduces a few major changes compared to cgroups V1, which necessitated a lot of extra code:

  • A new hierarchy for processes. This "unified" hierarchy requires new logic for code that wants to track processes in a cgroup
  • New controllers. The various resource controllers (cpuacct, memory, etc) have changed dramatically, and in a few cases have merged with other controllers for V2. This requires entirely new metrics code.

File structure

For ease of browsing through this PR (Github honestly isn't great at presenting large PRs), here's the breakdown of what's in here:

libbeat/metric/system/cgroup // The entirety of the cgroup implementation. The files in this base directory are (mostly) unchanged files from gosigar.
├── cgcommon // Files that are shared by cgv1 and cgv2. Mostly struct definitions and helper code
├── cgv1 // Metrics  for cgroups v1. This is almost entirely code from elastic/gosigar that's been refactored.
├── cgv2 // metrics for cgroups v2. This is entirely new code.
└── testdata // data used by by the various *_test.go files

This also includes:

  • libbeat/opt Which is mostly existing code that was moved from metricbeat/internal.
  • Changes to cmd/instance, add_process_metadata and add_docker_metadata to migrate away from gosigar
  • changes to metricbeat/internal to deal with moving the libbeat/opt code.

Why is it important?

Cgroups V2 is coming, and is already default on Fedora. It's supported by most other software at this point.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Pull down and build
  • Note what cgroups version your host is using. You can see with grep cgroup /proc/self/mountinfo
  • Test the system/process metricset, as well as add_process_metadata and add_docker_metadata

…c#27242)

* finish first round of cgroupsv2 support

* fix tests

* remove old code, fix PctOpt

* fix imports

* go mod tidy

* try to fix make notice again

* try again

* remove unneeded test files

* fix test paths

* fix crossbuild issues

* fix tests, remove debug statement

* change metadata, fight with mapping defs

* fix v1 test, remove debug line

* somewhat hacky fix for fields issues

* fix v1 fetch, update fields again

* fix omitempty issue

* make update

* remove older test

* fix tests, cgv1 logic

* remove old debug statement

* fix issue with how ubuntu mixes cgroups

* clean up error handling in libbeat

* changes based on feedback, increased error verbosity to try to fix baffling CI errors

* fix fields, add more error messages for weird CI bug

* I give up, add tons of debug statements

* fix issue with docker containers running under hybrid cgroups

* fix hostfs state check

* fix more broken tests

* fix names, log levels

* more changes, docs, test

* still making the mapping checks happy

* fix libbeat code I broke

(cherry picked from commit d898533)
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 12, 2021
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 12, 2021
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-12T20:15:28.556+0000

  • Duration: 158 min 50 sec

  • Commit: 98faf9c

Test stats 🧪

Test Results
Failed 0
Passed 51991
Skipped 5151
Total 57142

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 51991
Skipped 5151
Total 57142

@fearful-symmetry fearful-symmetry merged commit e1cdc00 into elastic:7.x Aug 12, 2021
@zube zube bot removed the [zube]: Done label Nov 11, 2021
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Nov 11, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b backport_27242_7.x upstream/backport_27242_7.x
git merge upstream/7.x
git push upstream backport_27242_7.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Team:Integrations Label for the Integrations team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants