Skip to content

Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

@smitpatel

Description

@smitpatel

Currently for collection include queries, we do split queries. In order to select only related data from 2nd query, we do inner join with first query. This works fine for all cases when first query is simple. If the first query has any client evaluation then in second query, we actually end up doing inner join on client. Which causes us to fetch the first table twice. Even in the absence of client eval, if the first query involves multiple joins due to filtering & ordering then we would recompute the same thing on server.

At least for the scenarios with single column FK, an alternative could be to use key values for filtering the related data table. It is a bargain between N + 1 queries (which uses single key value) & 2 queries which uses whole first query for key values.
Since we don't want to do N+1, this involves some buffering of results on client side. To give details with an example, suppose our default for buffering size is 100. Then for Customer-Orders query,

  • We run first query to fetch Customers.
  • We iterate over 100 records and generate Customer objects and buffer them internally.
  • Use key values from those buffered results to run 2nd query on Orders table.
  • Combine results from Orders while iterating (the way we do split query include right now) and give back results to customer.

It gives benefit of reusing same SelectExpression for 2nd query multiple times without missing 2nd level Cache(#12777). We don't run out of memory because we would buffer only a chunk of results. For this example we would run N/100 + 1 queries to get all results.
For the scenarios described in first paragraph, it avoids all the issues they causes.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions