Skip to content

Support Shipper monitoring in elastic-agent#2427

Merged
fearful-symmetry merged 3 commits intoelastic:mainfrom
fearful-symmetry:shipper-monitoring
Apr 6, 2023
Merged

Support Shipper monitoring in elastic-agent#2427
fearful-symmetry merged 3 commits intoelastic:mainfrom
fearful-symmetry:shipper-monitoring

Conversation

@fearful-symmetry
Copy link
Copy Markdown
Contributor

@fearful-symmetry fearful-symmetry commented Mar 30, 2023

What does this PR do?

See elastic/elastic-agent-shipper#267

This requires elastic/elastic-agent-shipper#289

This adds shipper monitoring support to elastic-agent. Right now I'm opening this as a draft PR for a few reasons:

  • the index field in the config doesn't seem to do anything, and we just create a datastream, and I'm not sure if it needs to be there at all
  • I'm not 100% sure the dataset and various id fields are set correctly.

Right now, when run with monitoring enabled, this results in two documents-per-period: one with queue metrics, and one with system resource metrics

The complete document for resource metrics
{
  "_index": ".ds-metrics-elastic_agent.shipper-default-2023.03.29-000001",
  "_id": "QaoANIcBsNMa2qX4bRHy",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2023-03-30T19:30:48.071663908Z",
    "event": {
      "duration": 7959499,
      "dataset": "elastic_agent.shipper",
      "module": "http"
    },
    "agent": {
      "name": "shoebill.nest",
      "type": "metricbeat",
      "version": "8.8.0",
      "id": "90c33586-728c-406a-b9a9-0e0eb52a2a81",
      "ephemeral_id": "1388106b-11a9-4c02-94fd-bb07715ab947"
    },
    "metricset": {
      "name": "json",
      "period": 10000
    },
    "host": {
      "mac": [
        "02-42-1E-01-77-4F",
      ],
      "hostname": "shoebill.nest",
      "name": "shoebill.nest",
      "architecture": "x86_64",
      "os": {
        "type": "linux",
        "platform": "fedora",
        "version": "36 (Server Edition)",
        "family": "redhat",
        "name": "Fedora Linux",
        "kernel": "6.0.7-200.fc36.x86_64",
        "codename": "Thirty Six"
      },
      "id": "f1d831c8916f41339bd9b4fc73c6de97",
      "containerized": false,
      "ip": [
        "192.168.1.96",
      ]
    },
    "service": {
      "address": "http://unix/stats",
      "type": "http"
    },
    "data_stream": {
      "namespace": "default",
      "type": "metrics",
      "dataset": "elastic_agent.shipper"
    },
    "system": {
      "process": {
        "cpu": {
          "user": {
            "time": {
              "ms": 1280
            },
            "ticks": 1280
          },
          "system": {
            "time": {
              "ms": 1230
            },
            "ticks": 1230
          },
          "total": {
            "time": {
              "ms": 2510
            },
            "value": 2510,
            "ticks": 2510
          }
        },
        "memory": {
          "size": 90620184
        },
        "fd": {
          "open": 23,
          "limit": {
            "soft": 524288,
            "hard": 524288
          }
        },
        "cgroup": {
          "memory": {
            "id": "session-1749.scope",
            "mem": {
              "usage": {
                "bytes": 14167662592
              }
            }
          },
          "cpu": {
            "id": "session-1749.scope",
            "stats": {
              "throttled": {
                "ns": 0,
                "periods": 0
              },
              "periods": 0
            }
          }
        }
      }
    },
    "ecs": {
      "version": "8.0.0"
    },
    "elastic_agent": {
      "id": "90c33586-728c-406a-b9a9-0e0eb52a2a81",
      "snapshot": false,
      "version": "8.8.0",
      "process": "shipper"
    }
  },
  "fields": {
    "system.process.cpu.system.ticks": [
      1230
    ],
    "elastic_agent.version": [
      "8.8.0"
    ],
    "elastic_agent.process": [
      "shipper"
    ],
    "system.process.cpu.total.value": [
      2510
    ],
    "host.hostname": [
      "shoebill.nest"
    ],
    "host.mac": [
      "02-42-1E-01-77-4F",
      "02-42-A3-24-5B-D0",
      "02-42-BC-C0-AC-20",
      "0A-00-27-00-00-00",
      "0A-00-27-00-00-01",
      "0A-00-27-00-00-02",
      "5E-B3-C9-E0-AD-96",
      "6E-80-26-76-B5-A4",
      "74-86-7A-F0-D9-44",
      "74-86-7A-F0-D9-45",
      "74-86-7A-F0-D9-46",
      "74-86-7A-F0-D9-47",
      "E2-79-4F-36-E2-DD"
    ],
    "service.type": [
      "http"
    ],
    "system.process.memory.size": [
      90620184
    ],
    "host.os.version": [
      "36 (Server Edition)"
    ],
    "system.process.cgroup.cpu.stats.periods": [
      0
    ],
    "system.process.fd.open": [
      23
    ],
    "system.process.fd.limit.soft": [
      524288
    ],
    "host.os.name": [
      "Fedora Linux"
    ],
    "system.process.cgroup.cpu.stats.throttled.ns": [
      0
    ],
    "system.process.cgroup.cpu.stats.throttled.periods": [
      0
    ],
    "agent.name": [
      "shoebill.nest"
    ],
    "host.name": [
      "shoebill.nest"
    ],
    "system.process.cgroup.cpu.id": [
      "session-1749.scope"
    ],
    "host.os.type": [
      "linux"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "host.architecture": [
      "x86_64"
    ],
    "agent.id": [
      "90c33586-728c-406a-b9a9-0e0eb52a2a81"
    ],
    "host.containerized": [
      false
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "service.address": [
      "http://unix/stats"
    ],
    "agent.version": [
      "8.8.0"
    ],
    "host.os.family": [
      "redhat"
    ],
    "system.process.cgroup.memory.id": [
      "session-1749.scope"
    ],
    "system.process.fd.limit.hard": [
      524288
    ],
    "system.process.cpu.user.ticks": [
      1280
    ],
    "system.process.cpu.system.time.ms": [
      1230
    ],
    "system.process.cgroup.memory.mem.usage.bytes": [
      14167663000
    ],
    "host.ip": [
      "192.168.1.96",
      "fe80::7686:7aff:fef0:d944",
      "192.168.49.1",
      "172.17.0.1",
      "fe80::42:1eff:fe01:774f",
      "172.24.0.1",
      "fe80::42:bcff:fec0:ac20",
      "fe80::5cb3:c9ff:fee0:ad96",
      "fe80::e079:4fff:fe36:e2dd",
      "fe80::6c80:26ff:fe76:b5a4"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "event.module": [
      "http"
    ],
    "host.os.kernel": [
      "6.0.7-200.fc36.x86_64"
    ],
    "elastic_agent.snapshot": [
      false
    ],
    "host.id": [
      "f1d831c8916f41339bd9b4fc73c6de97"
    ],
    "system.process.cpu.user.time.ms": [
      1280
    ],
    "system.process.cpu.total.time.ms": [
      2510
    ],
    "system.process.cpu.total.ticks": [
      2510
    ],
    "elastic_agent.id": [
      "90c33586-728c-406a-b9a9-0e0eb52a2a81"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "metricset.period": [
      10000
    ],
    "host.os.codename": [
      "Thirty Six"
    ],
    "event.duration": [
      7959499
    ],
    "metricset.name": [
      "json"
    ],
    "@timestamp": [
      "2023-03-30T19:30:48.071Z"
    ],
    "host.os.platform": [
      "fedora"
    ],
    "data_stream.dataset": [
      "elastic_agent.shipper"
    ],
    "agent.ephemeral_id": [
      "1388106b-11a9-4c02-94fd-bb07715ab947"
    ],
    "event.dataset": [
      "elastic_agent.shipper"
    ]
  }
}
The complete document for shipper metrics
{
  "_index": ".ds-metrics-elastic_agent.shipper-default-2023.03.29-000001",
  "_id": "raoANIcBsNMa2qX4bRCM",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2023-03-30T19:30:45.056506183Z",
    "event": {
      "dataset": "elastic_agent.shipper",
      "module": "http",
      "duration": 907153
    },
    "elastic_agent": {
      "snapshot": false,
      "version": "8.8.0",
      "id": "90c33586-728c-406a-b9a9-0e0eb52a2a81",
      "process": "shipper"
    },
    "shipper": {
      "queue": {
        "is_full": false,
        "limit_reached_count": 0,
        "max_level": 4096,
        "unacked_read": 4,
        "current_level": 4
      }
    },
    "service": {
      "type": "http",
      "address": "http://unix/shipper"
    },
    "host": {
      "name": "shoebill.nest",
      "ip": [
        "192.168.1.96",
      ],
      "mac": [
        "02-42-1E-01-77-4F",
      ],
      "hostname": "shoebill.nest",
      "architecture": "x86_64",
      "os": {
        "version": "36 (Server Edition)",
        "family": "redhat",
        "name": "Fedora Linux",
        "kernel": "6.0.7-200.fc36.x86_64",
        "codename": "Thirty Six",
        "type": "linux",
        "platform": "fedora"
      },
      "id": "f1d831c8916f41339bd9b4fc73c6de97",
      "containerized": false
    },
    "data_stream": {
      "namespace": "default",
      "type": "metrics",
      "dataset": "elastic_agent.shipper"
    },
    "metricset": {
      "period": 10000,
      "name": "json"
    },
    "agent": {
      "version": "8.8.0",
      "id": "90c33586-728c-406a-b9a9-0e0eb52a2a81",
      "ephemeral_id": "1388106b-11a9-4c02-94fd-bb07715ab947",
      "name": "shoebill.nest",
      "type": "metricbeat"
    },
    "ecs": {
      "version": "8.0.0"
    }
  },
  "fields": {
    "shipper.queue.max_level": [
      4096
    ],
    "elastic_agent.version": [
      "8.8.0"
    ],
    "elastic_agent.process": [
      "shipper"
    ],
    "host.hostname": [
      "shoebill.nest"
    ],
    "host.mac": [
      "02-42-1E-01-77-4F",
    ],
    "service.type": [
      "http"
    ],
    "host.ip": [
      "192.168.1.96",
    ],
    "agent.type": [
      "metricbeat"
    ],
    "event.module": [
      "http"
    ],
    "host.os.version": [
      "36 (Server Edition)"
    ],
    "host.os.kernel": [
      "6.0.7-200.fc36.x86_64"
    ],
    "host.os.name": [
      "Fedora Linux"
    ],
    "shipper.queue.limit_reached_count": [
      0
    ],
    "agent.name": [
      "shoebill.nest"
    ],
    "shipper.queue.current_level": [
      4
    ],
    "elastic_agent.snapshot": [
      false
    ],
    "host.name": [
      "shoebill.nest"
    ],
    "host.id": [
      "f1d831c8916f41339bd9b4fc73c6de97"
    ],
    "shipper.queue.unacked_read": [
      4
    ],
    "host.os.type": [
      "linux"
    ],
    "elastic_agent.id": [
      "90c33586-728c-406a-b9a9-0e0eb52a2a81"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "metricset.period": [
      10000
    ],
    "host.os.codename": [
      "Thirty Six"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "shipper.queue.is_full": [
      false
    ],
    "host.architecture": [
      "x86_64"
    ],
    "event.duration": [
      907153
    ],
    "metricset.name": [
      "json"
    ],
    "@timestamp": [
      "2023-03-30T19:30:45.056Z"
    ],
    "agent.id": [
      "90c33586-728c-406a-b9a9-0e0eb52a2a81"
    ],
    "host.containerized": [
      false
    ],
    "host.os.platform": [
      "fedora"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "service.address": [
      "http://unix/shipper"
    ],
    "data_stream.dataset": [
      "elastic_agent.shipper"
    ],
    "agent.ephemeral_id": [
      "1388106b-11a9-4c02-94fd-bb07715ab947"
    ],
    "agent.version": [
      "8.8.0"
    ],
    "host.os.family": [
      "redhat"
    ],
    "event.dataset": [
      "elastic_agent.shipper"
    ]
  }
}

Why is it important?

We need monitoring for the shipper

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

  • Pull down this PR
  • Pull down and build Refactor monitoring to make the shipper more beats-like, and compatible with Agent elastic-agent-shipper#289
  • Build the agent with EXTERNAL=false mage dev:package
  • Unpack the tarball
  • Copy over the shipper binary and shipper.spec.yml file from specs/ over to the components/ subdirectory in the unpacked tarball
  • rename elastic-agent-shipper to shipper
  • Configure/enroll elastic-agent however you like
  • agent.monitoring.enabled must be set to true
  • run
  • Wait for .ds-metrics-elastic_agent.shipper-default-* events to ingest

@fearful-symmetry fearful-symmetry added the Team:Elastic-Agent Label for the Agent team label Mar 30, 2023
@fearful-symmetry fearful-symmetry self-assigned this Mar 30, 2023
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2023

This pull request does not have a backport label. Could you fix it @fearful-symmetry? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 30, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-04-05T21:24:34.597+0000

  • Duration: 17 min 52 sec

Test stats 🧪

Test Results
Failed 0
Passed 5387
Skipped 23
Total 5410

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 30, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.485% (65/66) 👍
Files 69.604% (158/227) 👍
Classes 68.736% (299/435) 👍
Methods 53.883% (909/1687) 👎 -0.032
Lines 38.896% (10179/26170) 👎 -0.134
Conditionals 100.0% (0/0) 💚

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

/test

@fearful-symmetry fearful-symmetry marked this pull request as ready for review March 31, 2023 18:24
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner March 31, 2023 18:24
@fearful-symmetry fearful-symmetry requested review from AndersonQ, cmacknz and michalpristas and removed request for a team March 31, 2023 18:24
Copy link
Copy Markdown
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise I don't see any issue with the PR other than my one inline comment.

Outside of the code, I am sad to see that we are still modifying Elastic Agent code to support collecting metrics from components that Elastic Agent runs. I was hoping that with the shipper being a newly built component that we would not follow the same path that we did with beats.

I would prefer to see effort put in to use something like open telemetry that is a unified protocol that the Elastic Agent could collect from all components. We need this for a nice developer experience for components with the Elastic Agent.

@cmacknz ^ for visibility

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Apr 5, 2023

I would prefer to see effort put in to use something like open telemetry that is a unified protocol that the Elastic Agent could collect from all components. We need this for a nice developer experience for components with the Elastic Agent.

Agreed and this keeps adding friction for new inputs being added that are not based on Beats, like the Universal profiling agent. We just couldn't take the hit to the shipper schedule to absorb redesigning this as part of the shipper project. We'll have to do it separately.

name := "shipper" // in other beats this is the binary name, but we can hard-code it here.
if comp.ShipperSpec.Spec.Name != "" {
name = comp.ShipperSpec.Spec.Name
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you placing this in an if? A spec must always have a name and is validated by the Elastic Agent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, wasn't sure if that field could be blank, so decided to err on the side of caution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Team:Elastic-Agent Label for the Agent team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support monitoring the shipper from the Elastic Agent

4 participants