-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries
The "extended" queries are not part of the official ClickBench benchmark. Instead they are used to test other DataFusion features that are not covered by the standard benchmark.
Recently I tried to run Q5 for benchmarking and I got an error:
Error during planning: WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont)
To Reproduce
Run this query in datafusion-cli
SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT("ResponseStartTiming", 0.95) tp95, MAX("ResponseStartTiming") tmax
FROM 'hits.parquet'
WHERE "JavaEnable" = 0 -- filters to 32M of 100M rows
GROUP BY "ClientIP", "WatchID"
HAVING c > 1
ORDER BY tp95 DESC
LIMIT 10;For example:
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ ~/Software/datafusion/target/debug/datafusion-cli
DataFusion CLI v47.0.0
> SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT("ResponseStartTiming", 0.95) tp95, MAX("ResponseStartTiming") tmax
FROM 'hits.parquet'
WHERE "JavaEnable" = 0 -- filters to 32M of 100M rows
GROUP BY "ClientIP", "WatchID"
HAVING c > 1
ORDER BY tp95 DESC
LIMIT 10;
Error during planning: WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont)Expected behavior
DF 47 runs the query
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ ~/Software/datafusion-cli/datafusion-cli-47.0.0
DataFusion CLI v47.0.0
> SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT("ResponseStartTiming", 0.95) tp95, MAX("ResponseStartTiming") tmax
FROM 'hits.parquet'
WHERE "JavaEnable" = 0 -- filters to 32M of 100M rows
GROUP BY "ClientIP", "WatchID"
HAVING c > 1
ORDER BY tp95 DESC
LIMIT 10;
+-------------+---------------------+---+------+------+------+
| ClientIP | WatchID | c | tmin | tp95 | tmax |
+-------------+---------------------+---+------+------+------+
| 1611957945 | 6655575552203051303 | 2 | 0 | 0 | 0 |
| -1402644643 | 8566928176839891583 | 2 | 0 | 0 | 0 |
+-------------+---------------------+---+------+------+------+
2 row(s) fetched.
Elapsed 5.360 seconds.
Additional context
However, it seems like postgres also rejects such queries without WIHINGROUP:
See this dbfiddle: https://www.db-fiddle.com/f/5dwiFr16TvBF8zF6f2TmSz/0
So I believe the right fix is to update the benchmark query
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
