Fix to only deny-list scheduled queries when watchdog is enabled #8541
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've been seeing some scheduled queries getting unexpectedly deny-listed when osquery is not running with watchdog enabled (example_denied_query_with_disabled_watchdog.txt), or when watchdog itself did not stop osquery. Based on the scheduled query definition of the deny-list field, I would expect that it would be deny-listed only by watchdog.
However, there appears to be a bug in
config.cppas it does not differentiate between afailed_queryby the watchdog or osquery dying by another means. Various signals sent to the osquery process could cause anexecuting_queryto become afailed_query, even if it was not because watchdog killed it from high resource consumption.I've applied a simple fix to verify that the
disable_watchdogflag isfalsebefore deny-listing afailed_query, and I've also added some logging for when a scheduled query gets skipped per expiry period or osquery initialization, or when it expires from the denylist.I've tested this fix a decent amount on a MacOS instance. I've included the verbose logs to show the fix in action:
I was testing with a 90 second deny query duration.
stress_test_pass's interval was set to30.stress_test_fail's interval was set to60.