-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Query rewrite rules (RFC) #80084
Description
Company or project name
ClickHouse
Use case
These rules allow for matching queries by patterns of their syntax tree and rewriting them according to user-defined rules for query optimizations, or to reject or limit certain queries.
Describe the solution you'd like
A new entity, "rule":
CREATE RULE rule_name AS
(
SELECT date, sum(hits) FROM stats WHERE page = {name:String} GROUP BY date
)
REWRITE TO
(
SELECT date, hits FROM totals WHERE page = {name:String}
)
A rewrite rule has the name, the source parameterized query, and the resulting parameterized query.
DROP RULE rule_name
ALTER RULE rule_name AS ...
These queries: CREATE RULE, DROP RULE, ALTER RULE, will require separate grants.
Other than REWRITE, we can introduce other rule types, such as:
CREATE RULE rule_name AS
(
SELECT date, sum(hits) FROM stats WHERE page = {name:String} GROUP BY date
)
REJECT WITH 'Exception message'
The rules are saved in a storage, that can be a local or Keeper storage, similarly to named collections, which is configured in query_rules_storage in the configuration file.
By default, no rules are active. The rules are activated by a new query/user-level setting, query_rules, which is an Array(String) listing the names of active rules. The rules are applied in the order of their listing in query_rules. If a rule does not exist, the query throws an exception.
The rules don't require a separate grant for their usage, as the access rights are controlled at the table level.
When a rule is applied to a query, we do AST traversal of the original query and the rule, using the natural (depth-first) order, and capturing parameters. The matching does not require any backtracking. If the query matches, we take the resulting query from the rule and substitute the captured parameters there.
For the purpose of query matching, we have to implement new types of query parameters: Expression, ExpressionList, Subquery. #76247 However, it is ok to keep the capabilities limited.
system.query_log records the original query before the application of query rules (for initial queries). The list of successfully applied rules will be recorded in the new column, applied_rules.
A new, system.query_rules table provides introspection for all existing query rules.
Describe alternatives you've considered
This feature provides an ad-hoc way for query matching and rewriting.
The language is narrow-focused, and there is no intention to make it powerful.
It can be used to implement optimizations for queries of a certain pattern, which is useful for dealing with systems that perform many uniform queries. It does not help with human-generated queries that have a lot of variability.
A certain scenario, which appears often: blocking queries from runaway scripts and external bots. This can easily be done by rewriting to SELECT throwIf(1) or a REJECT WITH rule. Today it can be done with an external script, while true; do clickhouse-client --query "KILL QUERY WHERE normalizedQueryHash(query) = 1234567890123456789"; sleep 1; done
Additional context
No response