-
Notifications
You must be signed in to change notification settings - Fork 190
[FEATURE] PPL ClickBench queries should produce results consistent with ClickHouse/Spark #4760
Copy link
Copy link
Open
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem?
Check 2 queries in PPLClickBenchIT, it produce in-consistent results compare to spark clickbench results.
Q43
Current
- PPL
source=hits
| where CounterID = 62 and EventDate >= '2013-07-01 00:00:00' and EventDate <= '2013-07-15 00:00:00' and IsRefresh = 0 and DontCountHits = 0
| eval M = date_format(EventTime, '%Y-%m-%d %H:00:00')
| stats bucket_nullable=false count() as PageViews by M
| sort M
| head 10 from 1000
Expected
- ppl
source=hits | where CounterID = 62 and EventDate >= '2013-07-01 00:00:00' and EventDate <= '2013-07-15 00:00:00' and IsRefresh = 0 and DontCountHits = 0 | stats bucket_nullable=false count by span(EventTime, 1m) as M | sort M | head 10 from 1000
- spark sql
SELECT DATE_TRUNC('minute', EventTime) AS M, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_TRUNC('minute', EventTime) ORDER BY DATE_TRUNC('minute', EventTime) LIMIT 10 OFFSET 1000;
- results
+-------------------+---------+
| M|PageViews|
+-------------------+---------+
|2013-07-15 12:40:00| 513|
|2013-07-15 12:41:00| 457|
|2013-07-15 12:42:00| 470|
|2013-07-15 12:43:00| 468|
|2013-07-15 12:44:00| 453|
|2013-07-15 12:45:00| 462|
|2013-07-15 12:46:00| 481|
|2013-07-15 12:47:00| 458|
|2013-07-15 12:48:00| 466|
|2013-07-15 12:49:00| 467|
+-------------------+---------+
Q28
Current
- ppl
source=hits
| where URL != ''
| stats bucket_nullable=false avg(length(URL)) as l, count() as c by CounterID
| where c > 100000
| sort - l
| head 25
- results
l,c,CounterID
453.1628285069236,2938865,233773
261.55616482670234,2510103,245438
230.68442591186866,3574007,122612
198.08936562473812,238660,234004
187.9562693941447,323229,1634
180.36009889818135,120528,786
139.11823037965326,216408,114157
123.96623033619909,146907,515
121.86100788770537,858171,256004
116.91921850037136,374306,95427
106.89180332891429,7115413,199550
103.7586784846365,494614,220992
95.01398682515553,163797,196239
92.1134607276574,396093,96948
91.9427013479645,738150,62
90.26380938339348,311998,188878
90.00957407022646,120325,249603
85.8864014117864,8527069,3922
85.03551145479048,124664,191697
82.76263550290444,131178,97467
82.05937991006292,802561,186300
74.73223497987863,169223,230962
74.65631730856313,253961,77639
74.13011535042938,605286,146891
73.96797959706166,507770,38
Expected
- spark sql
SELECT CounterID, AVG(length(URL)) AS l, COUNT(*) AS c FROM hits WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
+---------+------------------+-------+
|CounterID| l| c|
+---------+------------------+-------+
| 233773| 453.1628285069236|2938865|
| 245438|261.55616482670234|2510103|
| 122612|230.68442591186866|3574007|
| 234004|198.08936562473812| 238660|
| 1634| 187.9562693941447| 323229|
| 786|180.36009889818135| 120528|
| 114157|139.11823037965326| 216408|
| 515|123.96623033619909| 146907|
| 256004|121.86100672243644| 858171|
| 95427|116.91921850037136| 374306|
| 199550|106.89180332891429|7115413|
| 220992| 103.7586784846365| 494614|
| 196239| 95.01398682515553| 163797|
| 96948| 92.1134607276574| 396093|
| 62| 91.9427013479645| 738150|
| 188878| 90.26380938339348| 311998|
| 249603| 90.00957407022646| 120325|
| 3922| 85.88640117723921|8527069|
| 191697| 85.03551145479048| 124664|
| 97467| 82.76263550290444| 131178|
+---------+------------------+-------+
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Type
Projects
Status
Done