Skip to content
This repository was archived by the owner on Sep 17, 2024. It is now read-only.

feat: check how many processes of a process are running in the host#1153

Merged
mdelapenya merged 4 commits intoelastic:masterfrom
mdelapenya:218-backend-processes
May 10, 2021
Merged

feat: check how many processes of a process are running in the host#1153
mdelapenya merged 4 commits intoelastic:masterfrom
mdelapenya:218-backend-processes

Conversation

@mdelapenya
Copy link
Copy Markdown
Contributor

What does this PR do?

It creates a new step that abstract the logic to get the number of processes in a desired state, calling this new step in the existing one (with 1 as number of ocurrences).

We had to refactor the existing code to wait for processes, as it only checked that a process existed in the host using pgrep -n -l, which only checks for the existence of the latest process for a process.

With this new approach, we are switching to pgrep -d , $THE_PROCESS, which separates each pid for the process with the comma delimiter. Having that list of pids for a process we execute ps -q $PID -o state --no-headers (linux only) to get the current state of that pid, to discard zombie processes (we detected that the elastic-agent on Centos creates 3 processes, 2 of them in the zombie state cc/ @michalpristas).

Finally, we refactored the existing method to get the output of executing a command in a container to avoid receiving extra bytes at the beginning of the output. In the previous code we had a list of bytes to remove, but now we simply use a Docker client helper utility (stdcopy) that splits stdout from stderr in the container, making possible to retrieve just the output. See https://stackoverflow.com/a/57132902 for further information.

Why is it important?

It creates a more comprehensive check about the background processes that the elastic-agent spawns in the host, as we want to make sure the number of processes is correct.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have run the Unit tests for the CLI, and they are passing locally
  • I have run the End-2-End tests for the suite I'm working on, and they are passing locally
  • I have noticed new Go dependencies (run make notice in the proper directory)

How to test this PR locally

Tests for the backend processes:

SUITE="fleet" TAGS="backend_processes" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE BEATS_USE_CI_SNAPSHOT=false DEVELOPER_MODE=false make -C e2e functional-test

Considerations for the stand-alone mode using the Docker image

We noticed that the elastic-agent running in the docker image in stand-alone mode spawns only 1 filebeat process, instead of 2 as using the TAR/DEB/RPM installers. Maybe @michalpristas can help out here.

SUITE="fleet" TAGS="stand_alone_agent" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE BEATS_USE_CI_SNAPSHOT=false DEVELOPER_MODE=false make -C e2e functional-test

Related issues

This will allow removing the initial bytes when reading outputs from command
execution in a container
…ntainer

It uses pgrep to get all pids for a process, and then iterates through them
to get the runnable status for each pid. If the process must be started
in the host, then it will check that the pid is in the S status (to skip
zombie processes)
@mdelapenya mdelapenya self-assigned this May 10, 2021
@mdelapenya mdelapenya requested a review from a team May 10, 2021 10:59
@mdelapenya
Copy link
Copy Markdown
Contributor Author

As commented with @michalpristas online, the Docker image has a default configuration in which it collects metrics but not logs, but monitors everything, hence there are 2 metricbeat processes and 1 filebeat process.

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented May 10, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #1153 updated

  • Start Time: 2021-05-10T11:47:30.517+0000

  • Duration: 22 min 36 sec

  • Commit: 6ffa7c7

Test stats 🧪

Test Results
Failed 0
Passed 159
Skipped 0
Total 159

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 159
Skipped 0
Total 159

@mdelapenya mdelapenya marked this pull request as ready for review May 10, 2021 12:53
@mdelapenya mdelapenya merged commit 78a0d49 into elastic:master May 10, 2021
mergify bot pushed a commit that referenced this pull request May 10, 2021
…1153)

* fix: use docker's stdcopy to separate stdout from stderr

This will allow removing the initial bytes when reading outputs from command
execution in a container

* feat: support checking the number of occurrences of a process in a container

It uses pgrep to get all pids for a process, and then iterates through them
to get the runnable status for each pid. If the process must be started
in the host, then it will check that the pid is in the S status (to skip
zombie processes)

* fix: check for only one filebeat instance

* fix: check for empty response when listing agent's workdir

(cherry picked from commit 78a0d49)
mergify bot pushed a commit that referenced this pull request May 10, 2021
…1153)

* fix: use docker's stdcopy to separate stdout from stderr

This will allow removing the initial bytes when reading outputs from command
execution in a container

* feat: support checking the number of occurrences of a process in a container

It uses pgrep to get all pids for a process, and then iterates through them
to get the runnable status for each pid. If the process must be started
in the host, then it will check that the pid is in the S status (to skip
zombie processes)

* fix: check for only one filebeat instance

* fix: check for empty response when listing agent's workdir

(cherry picked from commit 78a0d49)
@mdelapenya
Copy link
Copy Markdown
Contributor Author

@adam-stokes I'm going to send a follow-up PR adding this to the deployer struct, so that we can start decoupling stuff preparing the multiplatform support

mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request May 12, 2021
* master:
  feat: simplify the initialisation of versions (elastic#1159)
  chore(mergify): delete upstream branches on merge (elastic#1158)
  chore: abstract process checks to the deployer (elastic#1156)
  feat: check how many processes of a process are running in the host (elastic#1153)
mdelapenya added a commit that referenced this pull request May 14, 2021
…are running in the host (#1155)

* feat: check how many processes of a process are running in the host (#1153)

* fix: use docker's stdcopy to separate stdout from stderr

This will allow removing the initial bytes when reading outputs from command
execution in a container

* feat: support checking the number of occurrences of a process in a container

It uses pgrep to get all pids for a process, and then iterates through them
to get the runnable status for each pid. If the process must be started
in the host, then it will check that the pid is in the S status (to skip
zombie processes)

* fix: check for only one filebeat instance

* fix: check for empty response when listing agent's workdir

(cherry picked from commit 78a0d49)

* fix: check for 1 filebeat process only

* fix: check for 1 metricbeat process only

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
mdelapenya added a commit that referenced this pull request May 14, 2021
… running in the host (#1154)

* feat: check how many processes of a process are running in the host (#1153)

* fix: use docker's stdcopy to separate stdout from stderr

This will allow removing the initial bytes when reading outputs from command
execution in a container

* feat: support checking the number of occurrences of a process in a container

It uses pgrep to get all pids for a process, and then iterates through them
to get the runnable status for each pid. If the process must be started
in the host, then it will check that the pid is in the S status (to skip
zombie processes)

* fix: check for only one filebeat instance

* fix: check for empty response when listing agent's workdir

(cherry picked from commit 78a0d49)

* fix: check for 1 filebeat process only

* fix: check for 1 metricbeat process only

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
@mdelapenya mdelapenya deleted the 218-backend-processes branch May 19, 2021 05:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Fleet] Parametrize and functionalize the check for the number of beats processes (currently only validates that 1 is running but 2 are the default)

3 participants