Skip to content

Elastic Agent: Child processes management issues, beats uncompleted uninstall, skipped/corrupted install. #30067

@aleksmaus

Description

@aleksmaus

We are seeing some cases in the field with osquerybeat where the install is corrupted on Windows.
https://discuss.elastic.co/t/osquery-manger-integration-wont-work-on-windows/295529/3

The osquerybeat runs a couple of child processes so the whole chain looks like this
agent->osquerybeat->osqueryd->osquery-extension

On windows it looks like when the osquerybeat deleted/uninstalled the process could have been killed by the agent, leaving osqueryd.exe orphaned running, so the install directory can not be deleted especially on windows since the file is in use.
When the next time the agent is to install osquerybeat it skips the install step because the osquerybeat install directory is already there. Osquerybeat install ends up being corrupted and osquerybeat.exe can't be started because it doesn't exists on the disk.

The Osquerybeat implementation on windows uses the following approach to kill the whole process tree if needed:

exec.Command("taskkill", "/F", "/T", "/PID", fmt.Sprint(cmd.Process.Pid)).Run()

Maybe agent should do something similar, which would help the cases where the agent just kills the intermediate child?

It seems there are a couple of things that could be done to improve the situation:

  1. Better tracking of child processes and cleaner process tree kill.
  2. Maybe, some install state metadata on the disk that would allow to properly reinstall the product even in the cases where the install directory was not properly deleted cleaned.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions