[Ingest Manager] Expose processes and their metrics#24788
[Ingest Manager] Expose processes and their metrics#24788michalpristas merged 30 commits intoelastic:masterfrom
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
simitt
left a comment
There was a problem hiding this comment.
Thanks @michalpristas ! Checked out your PR and it generally works as described. Left some minor comments.
The processes endpoints need to be exposed via a TCP port though, so that the information can be queried from other containers via http request. The port needs to be configurable. It's fine to do this in a follow-up PR, but it is a requirement for cloud to be able to collect the information (required for 7.13).
| ) | ||
|
|
||
| const ( | ||
| procuctIDKey = "processID" |
There was a problem hiding this comment.
The PR uses product, program and process at different places for logic dealing with processes. For consistency reasons, I think we should aim for always using process, which reduces the mental overhead.
| metricsBytes, metricsErr := processMetrics(r.Context(), id) | ||
| if metricsErr != nil { | ||
| resp := errResponse{ | ||
| Type: "UNEXPECTED", |
There was a problem hiding this comment.
used a couple of times, maybe worth introducing a typeUnexpected
|
i will make it exposed using configurable TCP in this PR, this is still a draft made in a way so you can pick it up as soon as possible. @simitt is it ok to make it configurable in a way that when configuration is missing it wont be exposed so i dont use ports when i dont need to? is cloud capable of updating this port before agent start? or do you need static port from the start |
|
+1 on having it disabled by the default. Will this be configurable through Fleet or not? The current preference would be that the person running Elastic Agent can set this and not necessarily available in Fleet. |
|
as is now it's not configurable from fleet |
|
@simitt updated solution with HTTP endpoint, processes wont be exposed unless As we did not received any feedback from cloud just yet, only |
|
thanks @michalpristas; I'll give it a try as soon as possible |
|
I set this up on port 81.
For I was a bit surprised that Every time the |
| # # process stats are exposed only using this option | ||
| # # it is up to a caller to make sure port is usable and free to use. | ||
| # # by default 0 is used meaning socket is used instead. | ||
| # port: 0 |
There was a problem hiding this comment.
I suggest we use the same config blocks as we have in beats: https://github.com/elastic/beats/blob/master/filebeat/filebeat.reference.yml#L2507
We need a host to decide if it should only be exposed on localhost or broader. This also allows to add the enabled option.
There was a problem hiding this comment.
Right, at least for when it is run in docker localhost would not be sufficient.
There was a problem hiding this comment.
updated with http.enabled/host/port options
There was a problem hiding this comment.
Great. @simitt I assume on Cloud we can just use this in the template and make the port configurable or hardcode.
|
Pinging @elastic/agent (Team:Agent) |
| # # When using IP addresses, it is recommended to only use localhost. | ||
| # host: localhost | ||
| # # Port on which the HTTP endpoint will bind. Default is 0 meaning feature is disabled. | ||
| # port: 0 |
There was a problem hiding this comment.
As we have enabled / disabled now, do we still need the support for 0 ?
| return | ||
| } | ||
|
|
||
| fmt.Fprint(w, string(metricsBytes)) |
There was a problem hiding this comment.
I think I normally like is that metrics endpoint are also human readable. What I mean in the context here is that we pretty print the json. Unfortunately this means in this context to convert it first to json to be able to pretty print it with indentation. At the same time, should not cause too much overhead?
There was a problem hiding this comment.
how do you specify pretty print with metricbeat? we can pass argument to mb if passed to agent
There was a problem hiding this comment.
As far as I remember there is a ?pretty flag: /stats/?pretty. Not sure if that works over the socket. I was initially thinking to implement it here so it works for all the outputs also agent, but not strong preference.
There was a problem hiding this comment.
I think the pretty was a special flag provided by the expvar handler. Not sure we have had it implemented in Beats.
There was a problem hiding this comment.
BTW this is not a blocker, please ignore it for now.
|
@michalpristas How do I reach the metric data from elastic-agent itself? |
|
|
| type MonitoringHTTPConfig struct { | ||
| Enabled bool `yaml:"enabled" config:"enabled"` | ||
| Host string `yaml:"host" config:"host"` | ||
| Port int `yaml:"port" config:"port"` |
There was a problem hiding this comment.
Let's add a 'positive' validator (see unpack) docs. Then if port is configured, but empty we will fail to parse the configuration and fail with the setting that failed. The Enabled will be used to not start the server.
| func processMetrics(ctx context.Context, id string) ([]byte, int, error) { | ||
| detail, err := parseID(id) | ||
| if err != nil { | ||
| return nil, http.StatusInternalServerError, err |
There was a problem hiding this comment.
This is no internal error, but the user did provide invalid input.
| return | ||
| } | ||
|
|
||
| fmt.Fprint(w, string(metricsBytes)) |
There was a problem hiding this comment.
I think the pretty was a special flag provided by the expvar handler. Not sure we have had it implemented in Beats.
Length limit is 104 on unix.
Sync legacy apm ingest folder to HOME dir.
* Add baseline ECS 1.9.0 upgrade * update changelog
* feat: stage execution cache * fix: use correct context * fix: do not check stage status on the first run * fix: proper URL * chore: show message when the stache is skip * fix: correct path * fix: add final / * test: is the path needed? * fix: remove prefix * chore: refactor to use curl to download * chore: use pipeline step
…c#24904) * Add check for URL set when cert and cert key. * Add changelog.
…o expose-processes
| return nil, 0, errorWithStatus(http.StatusInternalServerError, err) | ||
| } | ||
|
|
||
| return rb, resp.StatusCode, nil |
There was a problem hiding this comment.
i dont know, but i would rather proxy whatever is retrieved from beat than mix 200 with error message
[Ingest Manager] Expose processes and their metrics (elastic#24788)
* upstream/master: (308 commits) [winlogbeat] Add support for sysmon v13 events 24 and 25 (elastic#24945) mergify: add backport label (elastic#25050) Add pod.ip in k8s metadata (elastic#25037) [elastic-agent] Use fleet.url for container cmd (elastic#25026) disable TestXPackEnabled flaky test in logstash metricbeat module (elastic#25034) Leverege leader election in agent k8s manifests (elastic#25016) libbeat/publisher/pipeline: expand monitoring (elastic#24700) libbeat: fix decode_json_fields config validation (elastic#24862) Remove make docs-preview instructions (elastic#25001) [Filebeat] Fix IPtables pipeline (elastic#24928) [DOCS] cd into correct directory before invoking mage. (elastic#17679) Add -buildmode=pie for supported platform (elastic#24964) Add agent's direcotry in k8s manifest generator (elastic#24987) [mergify] assign the original author (elastic#25007) Fix AWS module flaky tests (elastic#24852) [filebeat] Use fail_on_template_error on google_workspace and okta pagination (elastic#24967) Updated config to match defaults (elastic#25004) [Filebeat] Fix hardcoded amazonaws.com endpoint (elastic#24861) Add cloud.service.name to add_cloud_metadata (elastic#24993) [Ingest Manager] Expose processes and their metrics (elastic#24788) ...
What does this PR do?
Added
/processesand/processes/{processID}endpoints to http server.agent has its server on
unix:///tmp/elastic-agent/elastic-agent.sockornpipe:///elastic-agentfor windows.not configurable
example of
/processes{ "processes": [{ "id": "filebeat-default-monitoring", "pid": "8025", "binary": "filebeat", "source": { "kind": "internal", "outputs": ["default"] } }, { "id": "metricbeat-default-monitoring", "pid": "8043", "binary": "metricbeat", "source": { "kind": "internal", "outputs": ["default"] } }, { "id": "metricbeat-default", "pid": "7998", "binary": "metricbeat", "source": { "kind": "configured", "outputs": ["default"] } }] }example of
/processes/metricbeat-default{ "beat": { "cgroup": { "cpu": { "cfs": { "period": { "us": 100000 }, "quota": { "us": 0 } }, "id": "user.slice", "stats": { "periods": 0, "throttled": { "ns": 0, "periods": 0 } } }, "cpuacct": { "id": "user.slice", "total": { "ns": 994162833024 } }, "memory": { "id": "user.slice", "mem": { "limit": { "bytes": 9223372036854771712 }, "usage": { "bytes": 1766760448 } } } }, "cpu": { "system": { "ticks": 150, "time": { "ms": 156 } }, "total": { "ticks": 220, "time": { "ms": 232 }, "value": 220 }, "user": { "ticks": 70, "time": { "ms": 76 } } }, "handles": { "limit": { "hard": 1048576, "soft": 1024 }, "open": 17 }, "info": { "ephemeral_id": "aad52edf-4229-4927-bb30-c67ce9934499", "uptime": { "ms": 26728 } }, "memstats": { "gc_next": 16893216, "memory_alloc": 14370808, "memory_sys": 75056128, "memory_total": 34212120, "rss": 85266432 }, "runtime": { "goroutines": 58 } }, "libbeat": { "config": { "module": { "running": 4, "starts": 4, "stops": 0 }, "reloads": 1, "scans": 1 }, "output": { "events": { "acked": 0, "active": 0, "batches": 0, "dropped": 0, "duplicates": 0, "failed": 0, "toomany": 0, "total": 0 }, "read": { "bytes": 0, "errors": 0 }, "type": "elasticsearch", "write": { "bytes": 0, "errors": 0 } }, "pipeline": { "clients": 4, "events": { "active": 35, "dropped": 0, "failed": 0, "filtered": 0, "published": 35, "retry": 44, "total": 35 }, "queue": { "acked": 0 } } }, "metricbeat": { "system": { "cpu": { "events": 3, "failures": 0, "success": 3 }, "filesystem": { "events": 12, "failures": 0, "success": 12 }, "memory": { "events": 3, "failures": 0, "success": 3 }, "network": { "events": 17, "failures": 0, "success": 17 } } }, "system": { "cpu": { "cores": 4 }, "load": { "1": 0.58, "15": 1.43, "5": 1.43, "norm": { "1": 0.145, "15": 0.3575, "5": 0.3575 } } } }in case of error e.g
{ "type": "UNEXPECTED", "reason": "failed fetching metrics: Get \"http://unix/stats\": dial unix /tmp/elastic-agent/default/metricbeat/metricbeat.sock: connect: no such file or directory" }Why is it important?
Fixes: #24091
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.