Skip to content

Conversation

@Micah-Kolide
Copy link
Contributor

@Micah-Kolide Micah-Kolide commented Jan 31, 2025

We've been seeing some scheduled queries getting unexpectedly deny-listed when osquery is not running with watchdog enabled (example_denied_query_with_disabled_watchdog.txt), or when watchdog itself did not stop osquery. Based on the scheduled query definition of the deny-list field, I would expect that it would be deny-listed only by watchdog.

denylist: a boolean to determine if this query may be denylisted (when stopped by the Watchdog for excessive resource consumption), default true

However, there appears to be a bug in config.cpp as it does not differentiate between a failed_query by the watchdog or osquery dying by another means. Various signals sent to the osquery process could cause an executing_query to become a failed_query, even if it was not because watchdog killed it from high resource consumption.

I've applied a simple fix to verify that the disable_watchdog flag is false before deny-listing a failed_query, and I've also added some logging for when a scheduled query gets skipped per expiry period or osquery initialization, or when it expires from the denylist.

I've tested this fix a decent amount on a MacOS instance. I've included the verbose logs to show the fix in action:

1st init (watchdog disabled):
I0203 10:14:46.972573 107137088 init.cpp:413] osquery initialized [version=5.15.0-4-g745cdba37]
...
I0203 10:15:00.072829 1870835712 scheduler.cpp:120] Executing scheduled query stress_test_pass: select * from time
I0203 10:15:00.073619 1870835712 query.cpp:119] Storing initial results for new scheduled query: stress_test_pass
I0203 10:15:00.073698 1870835712 scheduler.cpp:201] Found results for query: stress_test_pass
...
I0203 10:15:40.208810 1870835712 scheduler.cpp:120] Executing scheduled query stress_test_fail: select * from apps CROSS JOIN users
I0203 10:15:58.768437 1870835712 query.cpp:119] Storing initial results for new scheduled query: stress_test_fail
I0203 10:15:59.021412 1870835712 scheduler.cpp:201] Found results for query: stress_test_fail

2nd init (watchdog enabled):
I0203 10:16:11.884200 107137088 init.cpp:413] osquery initialized [version=5.15.0-4-g745cdba37]
...
I0203 10:16:31.042694 1817571328 scheduler.cpp:120] Executing scheduled query stress_test_pass: select * from time
I0203 10:16:31.077574 1817571328 scheduler.cpp:201] Found results for query: stress_test_pass
...
I0203 10:16:36.055464 1817571328 scheduler.cpp:120] Executing scheduled query stress_test_fail: select * from apps CROSS JOIN users
I0203 10:16:38.922310 1839607808 watcher.cpp:538] pid: 50750, cpu: 1140ms/1440ms, memory: 28.47MB/3MB
W0203 10:16:38.922374 1839607808 watcher.cpp:425] osqueryd worker (50750) stopping: Memory limits exceeded: 29851648 bytes (limit is 3MB)
...
I0203 10:16:43.228420 107137088 init.cpp:410] osquery worker initialized [watcher=50749]
...
W0203 10:16:43.301170 107137088 config.cpp:362] Scheduled query may have failed: stress_test_fail
...
W0203 10:16:43.324247 107137088 config.cpp:508] The caller does not want denied queries, skipping denied scheduled query: stress_test_fail
...
I0203 10:17:00.395078 1872801792 scheduler.cpp:120] Executing scheduled query stress_test_pass: select * from time
I0203 10:17:00.420019 1872801792 scheduler.cpp:201] Found results for query: stress_test_pass
...
I0203 10:18:14.687698 1872801792 config.cpp:494] Scheduled denylisted query has expired: stress_test_fail
...
I0203 10:18:25.728713 1872801792 scheduler.cpp:120] Executing scheduled query stress_test_fail: select * from apps CROSS JOIN users
I0203 10:18:28.367564 1839607808 watcher.cpp:538] pid: 50784, cpu: 1052ms/1440ms, memory: 26.66MB/3MB
W0203 10:18:28.367623 1839607808 watcher.cpp:425] osqueryd worker (50784) stopping: Memory limits exceeded: 27951104 bytes (limit is 3MB)
...
I0203 10:18:32.669792 107137088 init.cpp:410] osquery worker initialized [watcher=50749]

I was testing with a 90 second deny query duration.
stress_test_pass's interval was set to 30.
stress_test_fail's interval was set to 60.

@Micah-Kolide Micah-Kolide requested review from a team as code owners January 31, 2025 21:16
directionless
directionless previously approved these changes Feb 3, 2025
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to me. Because we call this so often, we need to add a little state tracking to not spam the logs.

We could additional check for Flag::getValue("disable_watchdog") down around line 502, but maybe we don't need to with the change on input. In a day everything would settle.

@zwass @lucasmrod Either of you want to raise something? I'll merge this in a bit if you don't.

Co-authored-by: seph <seph@kolide.co>
Copy link
Member

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@directionless directionless merged commit 4aff953 into osquery:master Feb 5, 2025
22 checks passed
@directionless directionless added bug query-scheduler Query scheduler related. labels Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug query-scheduler Query scheduler related.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants