We are seeing some cases in the field with osquerybeat where the install is corrupted on Windows.
https://discuss.elastic.co/t/osquery-manger-integration-wont-work-on-windows/295529/3
The osquerybeat runs a couple of child processes so the whole chain looks like this
agent->osquerybeat->osqueryd->osquery-extension
On windows it looks like when the osquerybeat deleted/uninstalled the process could have been killed by the agent, leaving osqueryd.exe orphaned running, so the install directory can not be deleted especially on windows since the file is in use.
When the next time the agent is to install osquerybeat it skips the install step because the osquerybeat install directory is already there. Osquerybeat install ends up being corrupted and osquerybeat.exe can't be started because it doesn't exists on the disk.
The Osquerybeat implementation on windows uses the following approach to kill the whole process tree if needed:
|
exec.Command("taskkill", "/F", "/T", "/PID", fmt.Sprint(cmd.Process.Pid)).Run() |
Maybe agent should do something similar, which would help the cases where the agent just kills the intermediate child?
It seems there are a couple of things that could be done to improve the situation:
- Better tracking of child processes and cleaner process tree kill.
- Maybe, some install state metadata on the disk that would allow to properly reinstall the product even in the cases where the install directory was not properly deleted cleaned.
We are seeing some cases in the field with osquerybeat where the install is corrupted on Windows.
https://discuss.elastic.co/t/osquery-manger-integration-wont-work-on-windows/295529/3
The osquerybeat runs a couple of child processes so the whole chain looks like this
agent->osquerybeat->osqueryd->osquery-extension
On windows it looks like when the osquerybeat deleted/uninstalled the process could have been killed by the agent, leaving osqueryd.exe orphaned running, so the install directory can not be deleted especially on windows since the file is in use.
When the next time the agent is to install osquerybeat it skips the install step because the osquerybeat install directory is already there. Osquerybeat install ends up being corrupted and osquerybeat.exe can't be started because it doesn't exists on the disk.
The Osquerybeat implementation on windows uses the following approach to kill the whole process tree if needed:
beats/x-pack/osquerybeat/internal/osqd/osqueryd_windows.go
Line 45 in d00c2fe
Maybe agent should do something similar, which would help the cases where the agent just kills the intermediate child?
It seems there are a couple of things that could be done to improve the situation: