[7.x](backport #1153) feat: check how many processes of a process are running in the host#1154
[7.x](backport #1153) feat: check how many processes of a process are running in the host#1154mdelapenya merged 4 commits into7.xfrom
Conversation
…1153) * fix: use docker's stdcopy to separate stdout from stderr This will allow removing the initial bytes when reading outputs from command execution in a container * feat: support checking the number of occurrences of a process in a container It uses pgrep to get all pids for a process, and then iterates through them to get the runnable status for each pid. If the process must be started in the host, then it will check that the pid is in the S status (to skip zombie processes) * fix: check for only one filebeat instance * fix: check for empty response when listing agent's workdir (cherry picked from commit 78a0d49)
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
|
Will double check why the tests find only 1 filebeat instance in 7.x. Do not merge until resolution |
| When the "elastic-agent" process is in the "started" state on the host | ||
| Then the "filebeat" process is in the "started" state on the host | ||
| And the "metricbeat" process is in the "started" state on the host | ||
| Then there are "2" instances of the "filebeat" process in the "started" state |
There was a problem hiding this comment.
@nchaulet this scenario is passing in master (this PR is a backport from #1153) but failing for both 7.x and 7.13 (see #1155)
It's weird than the 4 scenarios that are checking for 2 filebeat instances are failing in both maintenance branches. The logs say that only one filebeat process is in the running state. We run ps -q $PID -o state" --no-headers for each filebeat PID, waiting it is in the S state, which is the one we observed in the containers.
From 'man ps':
// D uninterruptible sleep (usually IO)
// R running or runnable (on run queue)
// S interruptible sleep (waiting for an event to complete)
// T stopped by job control signal
// t stopped by debugger during the tracing
// W paging (not valid since the 2.6.xx kernel)
// X dead (should never be seen)
// Z defunct ("zombie") process, terminated but not reaped by its parent
There was a problem hiding this comment.
@EricDavisX I'm totally confused with this scenario in the maintenance branches. Do you see why it is failing here?
There was a problem hiding this comment.
sorry i wasn't as responsive the last 2 days, i was heads down on triaging other critical fixes. i am back form the edge now. I think this is resolved tho, yes?
There was a problem hiding this comment.
Yep, we reduced the number of processes to 1 (FB and MB), but we did not change how the policy is created/assigned. How could the tests be notified about this kind of changes (apart of being broken)?
mdelapenya
left a comment
There was a problem hiding this comment.
As @ph suggested:
It's because it doesn't need to start another filebeat or metricbeat in this context, the policy only has the fleet-server integration.
If a system integration was added to the policy you will get 2 FB/2MB
we are updating the number of instances for filebeat, from 2 to 1

This is an automatic backport of pull request #1153 done by Mergify.
Mergify commands and options
More conditions and actions can be found in the documentation.
You can also trigger Mergify actions by commenting on this pull request:
@Mergifyio refreshwill re-evaluate the rules@Mergifyio rebasewill rebase this PR on its base branch@Mergifyio updatewill merge the base branch into this PR@Mergifyio backport <destination>will backport this PR on<destination>branchAdditionally, on Mergify dashboard you can:
Finally, you can contact us on https://mergify.io/