Skip to content

Conversation

@Micah-Kolide
Copy link
Contributor

Currently when processing the IN constraint in xFilter, sqlite core likes to call xFilter once per value. This PR tells sqlite core to hand off all values on an IN constraint to xFilter to reduce the amount of calls to xFilter.

Example:

Current osquery:

osquery> SELECT * FROM users WHERE uid IN (1, 501);
osquery planner: xBestIndex Evaluating constraints for table: users [index=0 column=0 term=0 usable=1]
osquery planner: xBestIndex Adding index constraint for table: users [column=uid arg_index=1 op=2]
osquery planner: xBestIndex Recording constraint set for table: users [cost=1.000000 size=1 idx=0]
osquery planner: xBestIndex Evaluating constraints for table: users [index=0 column=0 term=0 usable=0]
osquery planner: xBestIndex Recording constraint set for table: users [cost=1000000.000000 size=0 idx=1]
osquery planner: xOpen Opening cursor (0) for table: users
osquery planner: xFilter Filtering called for table: users [constraint_count=2 argc=1 idx=0]
osquery planner: xFilter Adding constraint to cursor (0): uid = 1
osquery planner: Scanning rows for cursor (0)
osquery planner: xFilter users generate returned row count:1
osquery planner: xFilter Filtering called for table: users [constraint_count=2 argc=1 idx=0]
osquery planner: xFilter Adding constraint to cursor (0): uid = 501
osquery planner: Scanning rows for cursor (0)
osquery planner: xFilter users generate returned row count:1
osquery planner: Closing cursor (0)


This PR's build of osquery:

osquery> SELECT * FROM users WHERE uid IN (1, 501);
osquery planner: xBestIndex Evaluating constraints for table: users [index=0 column=0 term=0 usable=1]
osquery planner: xBestIndex Adding index constraint for table: users [column=uid arg_index=1 op=2]
osquery planner: xBestIndex Recording constraint set for table: users [cost=1.000000 size=1 idx=0]
osquery planner: xOpen Opening cursor (0) for table: users
osquery planner: xFilter Filtering called for table: users [constraint_count=1 argc=1 idx=0]
osquery planner: xFilter Adding constraint to cursor (0): uid = 1
osquery planner: xFilter Adding constraint to cursor (0): uid = 501
osquery planner: Scanning rows for cursor (0)
osquery planner: xFilter users generate returned row count:2
osquery planner: Closing cursor (0)

@Micah-Kolide Micah-Kolide requested review from a team as code owners January 31, 2024 19:56
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty neat!

I'm not sure who a good reviewer is. @zwass, @Smjert either of you feel okay here?

@zwass
Copy link
Member

zwass commented Feb 13, 2024

Do we think there are any table implementations that rely on this behavior? Any thoughts on a strategy to determine that?

@directionless
Copy link
Member

Do we think there are any table implementations that rely on this behavior? Any thoughts on a strategy to determine that?

Feels unlikely. When I've brought this up previously, people thought (a) didn't work this way or (b) was inherent to the implementation and unfixable.

@Smjert
Copy link
Member

Smjert commented Feb 13, 2024

Yeah definitely it wasn't possible to do before, they added sqlite3_vtab_in 2 years ago.

The question here is if the IN operator gets translated to EQUALS in our code.

The problem I see here is that if the tables do not account for this (multiple constraints values for the same constraint), then the result is wrong, because they will filter only with the first value, while with the old method, the table logic would've been called multiple times, each with one of the values in the IN list, so it would've worked correctly.

If IN constraints are matched by EQUALS then we have to check the tables as I mentioned above

Here is a case, in the users table, where it should work fine for instance, but again, not all tables loop on the returned constraints.

if (context.constraints["uid"].exists(EQUALS)) {
    auto uids = context.constraints["uid"].getAll(EQUALS);
    for (const auto& uid : uids) {
      auto const auid_exp = tryTo<long>(uid, 10);
      if (auid_exp.isValue()) {
        getpwuid_r(auid_exp.get(), &pwd, buf.get(), bufsize, &pwd_results);
        if (pwd_results != nullptr) {
          genUser(pwd_results, results);
        }
      }
    }
  }

Also I think due to this, none of the table logic inside the loop has been tested with a loop count > 1. So I suspect we will encounter bugs.

@directionless
Copy link
Member

We talked about this briefly at Office Hours today.

@Smjert suggest that incorrect loops should be relatively to audit for -- we can examine tables that have getAll. But I'm not sure we can make comprehensive tests or catch things.

@Smjert
Copy link
Member

Smjert commented Feb 14, 2024

We talked about this briefly at Office Hours today.

@Smjert suggest that incorrect loops should be relatively to audit for -- we can examine tables that have getAll. But I'm not sure we can make comprehensive tests or catch things.

Just to clarify, there are 2 potential issues:

  1. Tables that do not handle multiple constraint values on a column
  2. Tables that do handle it, but the looping code has never been tested (which I fear is all of them)

For 1., the code will use getAll as the example above. The problematic code won't loop over the elements and will access only the first.

@Micah-Kolide
Copy link
Contributor Author

Sorry that I've been silent on this thread, I've been testing/fixing the various table generations this would affect. This was a big oversight I made on my part for the initial push.

For now I'm restricting this to only apply to indexed columns and not columns where additional=True as I felt refactoring all of the generates that would use it would be too big a scope for this PR.

@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from bf05a19 to c2edd50 Compare February 16, 2024 06:49
@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from 66f7878 to c5baea5 Compare February 26, 2024 20:40
@Micah-Kolide
Copy link
Contributor Author

Micah-Kolide commented Feb 26, 2024

I believe I've pushed all of the changes needed for safely utilizing this sqlite optimization. I'm still doing testing and will update the pr once I complete that.

@Micah-Kolide
Copy link
Contributor Author

I had realized what I was trying to do with the index and additional options didn't quite make sense to use for the IN optimization and I wanted to create another option for this change.

I found there was an old artifact in the ColumnOptions enum for an optimized option. The description for this option fits what I'm trying to achieve with this PR. It's not being used anywhere, so I've repurposed it to be used for the IN constraint optimization.

@Micah-Kolide
Copy link
Contributor Author

Micah-Kolide commented Jul 17, 2024

I've removed all of the places where I modified the table generate's use of query context. I'll put those changes into another PR if this one goes through. I figured it'd be more trackable this way and less to review for this PR.

Smjert
Smjert previously requested changes Jul 24, 2024
@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from 394b989 to 5180c98 Compare August 5, 2024 18:08
@Micah-Kolide Micah-Kolide requested a review from Smjert August 5, 2024 18:10
@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from 28dfe2d to 8c8cc36 Compare September 4, 2024 18:37
@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from 8c8cc36 to cd11317 Compare September 18, 2024 21:25
@Micah-Kolide
Copy link
Contributor Author

I removed the optimized changes from all of the tables except for curl_certificate to keep as an example. I still have those in another branch, but I wanted to make this one a bit more digestible.

@directionless directionless added this to the 5.14 milestone Oct 8, 2024
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can follow all of the virtual table code, but I'm willing to approve. We should be prepared to iterate if there's an issue.

description("Inspect TLS certificates by connecting to input hostnames.")
schema([
Column("hostname", TEXT, "Hostname to CURL (domain[:port], e.g. osquery.io)", required=True),
Column("hostname", TEXT, "Hostname to CURL (domain[:port], e.g. osquery.io)", required=True, optimized=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generate function looks like it should handle this optimization. Yes.

@directionless directionless changed the title Process IN constraints all at once in xFilter Add column optimization support to allow processing IN constraints all at once in xFilter Oct 8, 2024
@directionless directionless dismissed Smjert’s stale review October 9, 2024 02:56

requested changes were made

@directionless directionless merged commit 81aab71 into osquery:master Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants