-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
When the scalar subquery contains aggregation, our current rewriting strategy(Kim' method) is do the aggregation and then join(Left out join), this will lead the Count bug.
Need to handle the unmatched rows and construct the computing Project
To Reproduce
No response
Expected behavior
No response
Additional context
Paper: Orthogonal optimization of subqueries and aggregation
Aggregate, then join.
This execution strategy was original proposed by Kim [11]. It is possible to aggregate directly over the orders table to obtain the total sales per customer and later join with the customer table. This alse allows pushing the aggregate condition below the join. The SQL formulation uses a derived table, not a subquery.
select c custkey
from customer,
(select o custkey from orders
group by c custkey
having 1000000 < sum(o totalprice))
as AggResult
where o custkey = c custkeySection 3.2 Moving GroupBy around an outerjoin
In the result of an outerjoin, an unmatched row appears exactly once. Therefore given that our grouping columns include the key of the outer relation, a group that has an unmatched row cannot have any other row. The aggregate
functions use only the columns from the non-outer relation. For an unmatched row all these columns are NULL. Therefore the property of aggregate expressions that is important to us is the result of applying it to NULLs. If the result is
NULL as it is for most simple aggregate expressions, we need do nothing more. The outerjoin will automatically provide
the NULLs we need. For the aggregate expressions which do not result in a NULL, we need to add a project which
for each unmatched row sets the aggregate result to the appropriate constant value.
Note that this constant can be calculated at compile time. For count(*), the value of this constant is zero.
Formally we have
GA,F (S LOJpR) = πc(S LOJp(GA−columns(S),F R))
where the computing project πc introduces the non-NULL results if necessary.
No response