-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Our parameter-based query cache currently operates on parameter nullability; this is a mechanism we had in previous versions which allows us to vary SQL based on actual parameter value nullability and is potentially important for perf (#17543). This issue proposes a more general design that could unlock parameter-based caching for other scenarios - not just null checks on parameters.
Scenarios
- To translate
listParam.Contains(x), we expand the list parameter to a constant in the SQL, resulting inx IN (constant1, constant2...), since there are no list parameters. Instead, we could expand to a parameterized list:x IN (@p1, @p2), based on the number of elements in the given list (i.e. we'd cache one version for 1 param, another for 2 params, etc.). This would allow us to avoid the additional execution-time visitation, and utilise database query plans in a much better way (we need to think about cache explosion and eviction though). This is tracked by IN() list queries are not parameterized, causing increased SQL Server CPU usage #13617, but this feature could be a building block. - For String.StartsWith, if a parameterized pattern doesn't contain wildcards we can translate to a simple LikeExpression without an additional
LEFT(LEN())expression as we do now (although this would require exposure to method translators). We could even rewrite the search pattern to escape the wildcard characters. - For String.IndexOf, string.Contains, we currently have a complex expression to handle the case of an empty search string (Query: simplify IndexOf translation #18773). We could sniff that and produce simple/optimized SQL for non-empty strings.
- Methods which accept an enum could be translated to different commands. An example is methods accepting StringComparison (see Add ability to explicitly force query part evaluation during query generation #20610).
- Our current null caching mechanism looks at all parameters, regardless of whether their nullability is actually important. A more fine-grained approach would allow us to generate different cache entries only when the nullability of the parameter actually matters.
- We can do certain optimizations based on boolean parameters (e.g.
@p ? A : B) - PostgreSQL has a construct which allow sending a list parameter:
x = ANY (@listParam). However, depending on whether the parameter contains any nulls, an additional null check needs to be added (OR x IS NULL). We could cache two command versions - one for when the list contains null, and another when it doesn't.- Similarly, for one-element arrays we could render
x = @listParamElement, which apparently is more efficient (PG apparently doesn't reuse generic query plans with array parameters? /cc @vonzshik). For empty arrays we could also optimize away.
- Similarly, for one-element arrays we could render
- Any other scenario where some value (zero, empty string, whatever) is a special case where we want to produce different SQL.
Proposal
Delegate-based parameter cache lookup
For the following, let's concentrate on reimplementing the existing nullability-based caching.
- A visitor encounters a case where we want a different cache entry based on a parameter's value (e.g. some parameter whose nullability is important).
- The visitor registers a lambda on the QueryCompilationContext. The lambda accepts a QueryContext, and returns a bool which will becomes part of the cache key. For example, assume our nullability visitor decides that the nullability of parameter X is important; it would register a delegate accepting (another) QueryContext, and returning true if parameter X is null or not.
- At the end of the pipeline, we'd gather all registered lambdas and combine a ParameterBasedCacheKeyGenerator from them. As each lambda returns a single bool, the combined returned value (the cache key) would be a series of bits (possibly to be represented as a BitArray).
- We can have two SQLs for boolean parameters, at least in some cases. For example, for
boolParam ? x : ywe would elide the condition entirely in SQL. See also Additional refactoring of Null Semantics #19168. - The cache key generator would be executed over the current QueryContext, yielding the cache key for the current RelationalCommand, which would be added to the parameter-based query cache. The cache key generator would be stored as well.
- The next time a query makes its way to the parameter-based cache, the cache key generator is invoked on the new QueryContext and the result used for the cache lookup.
Assumption/limitation:
- For a given query, the same cache key generator must be generated regardless of parameter values. That is, we can't have different lambdas registered based on different parameter values.
- Cache key lambdas can only generate bool, although it's conceivable that a richer return value could be useful in some cases (e.g. 3 possible versions of the SQL). As a workaround, multiple bits can be combined if we ever need this.
QueryContextDependentExpression
The above takes care of nullability-based caching, where the visitor(s) involved run after the parameter-based cache has been consulted. However, we also want to vary SQL for parts of the tree that are handled before the parameter-based cache is consulted, and therefore cannot yet access the QueryContext. For example, we want to emit different SQLs for StartsWith, which is translated early. To support this, we can introduce a new expression type, QueryContextDependentExpression, which can be "resolved" into another, concrete expression, later in the pipeline when the QueryContext is known.
- A visitor or method translator (e.g. for StartsWith) embeds a QueryContextDependentExpression. This expression wraps a lambda which accepts a QueryContext, and returns the concrete expression to replace the QueryContextDependentExpression for that context.
- Later, after we pass parameter-based cache and the QueryContext is available, a visitor would traverse the tree to resolve these expressions. When finding a QueryContextDependentExpression, it would invoke the lambda with the current QueryContext and return the result.
- The same visitor must also register a bool-returning cache key lambda as above. This lambda would also need to be provided by the original visitor which generated the QueryContextDependentExpression.
Optional: client-side parameter transformations
While not technically part of this proposal, this feature is important to support the scenarios above. Ideally, we'd have a way to perform arbitrary parameter transformations before sending out parameters. Scenarios for this include:
- Escape special characters in LIKE patterns (e.g. for StartsWith/EndsWith/Contains). It seems like this is what EF6 does (Incorrect string matching behavior on SQL Server with whitespace patterns #19402 (comment)).
- Convert .NET 0-based string offsets into 1-based SQL offsets where appropriate.
We can discuss later whether a new parameter would be added or the existing value would be mutated.