Skip to content

colexec: add distinct mode to hashTable#45704

Merged
knz merged 1 commit intocockroachdb:masterfrom
Azhng:azhng-distinct-ht
Mar 6, 2020
Merged

colexec: add distinct mode to hashTable#45704
knz merged 1 commit intocockroachdb:masterfrom
Azhng:azhng-distinct-ht

Conversation

@Azhng
Copy link
Copy Markdown
Contributor

@Azhng Azhng commented Mar 4, 2020

Previously hashTable will buffer all tuples before building head and
same vector.
Now hashTable supports distinct mode where it only buffers
the distinct tuples from upstream operator. This removes the need of traversing
through the head and same vectors. Instead, now user of the hashTable
can directly build hashTable in distinct mode and copy the buffered tuples.

Closes #44404

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@Azhng Azhng force-pushed the azhng-distinct-ht branch 3 times, most recently from c6942b9 to edc6951 Compare March 4, 2020 18:24
@Azhng Azhng changed the title WIP: colexec: add distinct mode to hashTable colexec: add distinct mode to hashTable Mar 4, 2020
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark:

benchstat ~/.local/tmp/master.c10.benchmark ~/.local/tmp/dht.c10.benchmark
name                                                      old time/op    new time/op     delta
UnorderedDistinct/numCols=1/nulls=false/numBatches=1-8      56.0µs ±15%     54.7µs ± 3%     ~     (p=0.143 n=10+10)
UnorderedDistinct/numCols=1/nulls=false/numBatches=16-8      951µs ± 2%     1238µs ± 2%  +30.17%  (p=0.000 n=10+9)
UnorderedDistinct/numCols=1/nulls=false/numBatches=256-8    24.9ms ± 2%     34.1ms ± 2%  +37.02%  (p=0.000 n=9+9)
UnorderedDistinct/numCols=1/nulls=true/numBatches=1-8       90.2µs ± 1%     73.5µs ± 1%  -18.51%  (p=0.000 n=7+9)
UnorderedDistinct/numCols=1/nulls=true/numBatches=16-8      1.38ms ± 1%     1.29ms ± 3%   -6.51%  (p=0.000 n=10+9)
UnorderedDistinct/numCols=1/nulls=true/numBatches=256-8     31.1ms ± 3%     27.4ms ± 3%  -11.84%  (p=0.000 n=9+9)
UnorderedDistinct/numCols=2/nulls=false/numBatches=1-8      80.8µs ± 7%     74.2µs ±10%   -8.10%  (p=0.006 n=8+10)
UnorderedDistinct/numCols=2/nulls=false/numBatches=16-8     1.32ms ± 2%     1.77ms ±14%  +34.44%  (p=0.000 n=8+10)
UnorderedDistinct/numCols=2/nulls=false/numBatches=256-8    41.2ms ± 2%     54.8ms ± 1%  +33.06%  (p=0.000 n=10+9)
UnorderedDistinct/numCols=2/nulls=true/numBatches=1-8        144µs ± 2%      128µs ± 2%  -11.44%  (p=0.000 n=9+9)
UnorderedDistinct/numCols=2/nulls=true/numBatches=16-8      2.33ms ± 3%     2.34ms ± 4%     ~     (p=0.842 n=9+10)
UnorderedDistinct/numCols=2/nulls=true/numBatches=256-8     61.3ms ± 6%     57.6ms ± 4%   -6.02%  (p=0.000 n=10+9)

name                                                      old speed      new speed       delta
UnorderedDistinct/numCols=1/nulls=false/numBatches=1-8     148MB/s ±16%    150MB/s ± 3%     ~     (p=0.143 n=10+10)
UnorderedDistinct/numCols=1/nulls=false/numBatches=16-8    138MB/s ± 2%    106MB/s ± 2%  -23.18%  (p=0.000 n=10+9)
UnorderedDistinct/numCols=1/nulls=false/numBatches=256-8  84.4MB/s ± 2%   61.6MB/s ± 1%  -27.02%  (p=0.000 n=9+9)
UnorderedDistinct/numCols=1/nulls=true/numBatches=1-8     90.9MB/s ± 1%  111.5MB/s ± 1%  +22.72%  (p=0.000 n=7+9)
UnorderedDistinct/numCols=1/nulls=true/numBatches=16-8    94.6MB/s ± 1%  101.2MB/s ± 3%   +6.98%  (p=0.000 n=10+9)
UnorderedDistinct/numCols=1/nulls=true/numBatches=256-8   67.5MB/s ± 3%   76.6MB/s ± 3%  +13.42%  (p=0.000 n=9+9)
UnorderedDistinct/numCols=2/nulls=false/numBatches=1-8     203MB/s ± 6%    221MB/s ±10%   +9.03%  (p=0.006 n=8+10)
UnorderedDistinct/numCols=2/nulls=false/numBatches=16-8    199MB/s ± 2%    149MB/s ±13%  -25.27%  (p=0.000 n=8+10)
UnorderedDistinct/numCols=2/nulls=false/numBatches=256-8   102MB/s ± 2%     77MB/s ± 1%  -24.85%  (p=0.000 n=10+9)
UnorderedDistinct/numCols=2/nulls=true/numBatches=1-8      114MB/s ± 2%    128MB/s ± 2%  +12.92%  (p=0.000 n=9+9)
UnorderedDistinct/numCols=2/nulls=true/numBatches=16-8     112MB/s ± 3%    112MB/s ± 4%     ~     (p=0.842 n=9+10)
UnorderedDistinct/numCols=2/nulls=true/numBatches=256-8   68.5MB/s ± 6%   72.8MB/s ± 4%   +6.34%  (p=0.000 n=10+9)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@Azhng Azhng requested a review from a team March 4, 2020 18:51
@Azhng Azhng force-pushed the azhng-distinct-ht branch 3 times, most recently from a1d78a5 to 84d2387 Compare March 4, 2020 20:21
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased to master to pick up new benchmark, the number looks quite good

benchstat ~/.local/tmp/master.new.c10.benchmark ~/.local/tmp/dht.new.c10.benchmark
name                                                                        old time/op    new time/op    delta
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8      290µs ± 1%     291µs ± 2%      ~     (p=0.780 n=9+10)
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8      429µs ± 2%     359µs ± 1%   -16.22%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8    3.46ms ± 1%    1.90ms ± 1%   -45.00%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8    5.54ms ± 1%    2.78ms ± 2%   -49.90%  (p=0.000 n=8+10)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8      291µs ± 1%     291µs ± 1%      ~     (p=0.739 n=10+10)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8      417µs ± 1%     363µs ± 2%   -13.10%  (p=0.000 n=9+9)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8    3.82ms ± 3%    1.95ms ± 2%   -48.88%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8    6.75ms ± 1%    2.86ms ± 1%   -57.57%  (p=0.000 n=10+8)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8      300µs ± 1%     311µs ± 2%    +3.73%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8      439µs ± 9%     395µs ± 1%   -10.13%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8    4.27ms ± 4%    2.37ms ± 1%   -44.54%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8    6.81ms ± 5%    3.42ms ± 1%   -49.86%  (p=0.000 n=10+10)

name                                                                        old speed      new speed      delta
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8    226MB/s ± 1%   226MB/s ± 2%      ~     (p=0.780 n=9+10)
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8    306MB/s ± 2%   365MB/s ± 1%   +19.35%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8   303MB/s ± 1%   552MB/s ± 1%   +81.82%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8   378MB/s ± 1%   755MB/s ± 2%   +99.62%  (p=0.000 n=8+10)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8    225MB/s ± 1%   225MB/s ± 1%      ~     (p=0.739 n=10+10)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8    314MB/s ± 1%   362MB/s ± 2%   +15.07%  (p=0.000 n=9+9)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8   275MB/s ± 3%   537MB/s ± 2%   +95.58%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8   311MB/s ± 1%   732MB/s ± 1%  +135.67%  (p=0.000 n=10+8)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8    219MB/s ± 1%   211MB/s ± 2%    -3.59%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8    299MB/s ± 9%   331MB/s ± 3%   +10.79%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8   245MB/s ± 4%   442MB/s ± 1%   +80.23%  (p=0.000 n=10+9)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8   308MB/s ± 4%   614MB/s ± 1%   +99.36%  (p=0.000 n=10+10)

name                                                                        old alloc/op   new alloc/op   delta
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8      932kB ± 0%    1164kB ± 0%   +24.97%  (p=0.000 n=9+10)
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8     1.16MB ± 0%    1.20MB ± 0%    +3.01%  (p=0.000 n=9+8)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8    8.17MB ± 0%    1.17MB ± 0%   -85.68%  (p=0.000 n=10+8)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8    14.0MB ± 0%     1.2MB ± 0%   -91.34%  (p=0.000 n=8+9)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8      932kB ± 0%    1166kB ± 0%   +25.14%  (p=0.000 n=8+8)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8     1.16MB ± 0%    1.20MB ± 0%    +3.34%  (p=0.000 n=6+9)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8    8.17MB ± 0%    1.20MB ± 0%   -85.29%  (p=0.000 n=8+9)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8    14.0MB ± 0%     1.3MB ± 0%   -90.92%  (p=0.000 n=8+8)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8      932kB ± 0%    1181kB ± 0%   +26.70%  (p=0.000 n=8+9)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8     1.16MB ± 0%    1.23MB ± 0%    +5.84%  (p=0.000 n=7+8)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8    8.17MB ± 0%    1.77MB ± 0%   -78.27%  (p=0.000 n=8+10)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8    14.0MB ± 0%     2.2MB ± 0%   -83.99%  (p=0.000 n=9+10)

name                                                                        old allocs/op  new allocs/op  delta
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8        102 ± 0%        76 ± 0%   -25.49%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8        168 ± 0%       108 ± 0%   -35.71%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8       622 ± 0%       200 ± 0%   -67.85%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8     1.21k ± 0%     0.37k ± 0%   -69.21%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8        102 ± 0%        80 ± 0%   -21.57%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8        168 ± 0%       124 ± 0%   -26.19%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8       622 ± 0%       216 ± 0%   -65.27%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8     1.21k ± 0%     0.40k ± 0%   -67.22%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8        102 ± 0%        90 ± 0%   -11.76%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8        168 ± 0%       144 ± 0%   -14.29%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8       622 ± 0%       239 ± 0%   -61.58%  (p=0.000 n=10+10)
Distinct/Unordered/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8     1.21k ± 0%     0.44k ± 0%   -63.82%  (p=0.000 n=10+10)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@Azhng Azhng force-pushed the azhng-distinct-ht branch 4 times, most recently from b309846 to 277f414 Compare March 4, 2020 21:31
Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this! Do those benchmark numbers include nulls? It would also be nice to see the benchmarks of the HashJoiner, just to make sure there are no regressions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the new benchmark include nulls. But I double checked the benchmark from before, it was actually slower without the nulls. (not exactly sure why). I will modify the test to include nulls

HashJoiner benchmark:
Looks like there's some accident minor performance boosts:

benchstat ~/.local/tmp/master.hj.c10.benchmark ~/.local/tmp/dht.hj.c10.benchmark
name                                                                  old time/op    new time/op    delta
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=2048-8         262µs ± 3%     264µs ±13%    ~     (p=0.353 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=262144-8      18.8ms ± 1%    18.3ms ± 1%  -2.86%  (p=0.000 n=10+8)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=4194304-8      259ms ± 3%     254ms ± 1%  -1.98%  (p=0.000 n=9+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=2048-8        282µs ± 1%     280µs ± 5%    ~     (p=0.515 n=8+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=262144-8     25.5ms ± 2%    26.1ms ± 5%    ~     (p=0.211 n=9+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=4194304-8     1.07s ± 8%     1.03s ± 3%  -4.01%  (p=0.004 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=2048-8          273µs ± 2%     268µs ± 3%  -1.86%  (p=0.046 n=8+9)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=262144-8       20.9ms ±10%    20.3ms ± 6%    ~     (p=0.211 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=4194304-8       285ms ± 5%     280ms ± 2%    ~     (p=0.156 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=2048-8         292µs ± 1%     284µs ± 1%  -2.71%  (p=0.000 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=262144-8      27.2ms ± 2%    27.0ms ± 1%  -0.99%  (p=0.010 n=10+9)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=4194304-8      1.08s ± 1%     1.09s ± 2%    ~     (p=0.905 n=9+10)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=2048-8          339µs ± 1%     335µs ± 2%  -1.07%  (p=0.027 n=9+8)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=262144-8       28.9ms ± 3%    28.6ms ± 1%  -1.10%  (p=0.043 n=10+8)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=4194304-8       422ms ± 1%     414ms ± 1%  -1.96%  (p=0.000 n=8+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=2048-8         357µs ± 3%     344µs ± 2%  -3.63%  (p=0.000 n=8+9)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=262144-8      35.5ms ± 6%    34.9ms ± 1%  -1.52%  (p=0.035 n=9+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=4194304-8      1.24s ± 5%     1.22s ± 1%  -1.68%  (p=0.008 n=9+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=2048-8           350µs ± 1%     347µs ± 1%  -0.88%  (p=0.000 n=8+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=262144-8        30.0ms ± 1%    30.9ms ± 3%  +2.85%  (p=0.000 n=9+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=4194304-8        447ms ± 2%     442ms ± 2%    ~     (p=0.089 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=2048-8          366µs ± 1%     360µs ± 3%  -1.60%  (p=0.006 n=10+9)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=262144-8       37.0ms ± 2%    37.1ms ± 2%    ~     (p=0.549 n=9+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=4194304-8       1.30s ± 1%     1.29s ± 3%    ~     (p=0.393 n=10+10)

name                                                                  old speed      new speed      delta
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=2048-8       500MB/s ± 3%   498MB/s ±12%    ~     (p=0.353 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=262144-8     892MB/s ± 1%   919MB/s ± 1%  +2.94%  (p=0.000 n=10+8)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=4194304-8   1.04GB/s ± 3%  1.06GB/s ± 1%  +2.01%  (p=0.000 n=9+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=2048-8      464MB/s ± 1%   469MB/s ± 5%    ~     (p=0.515 n=8+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=262144-8    658MB/s ± 2%   644MB/s ± 5%    ~     (p=0.211 n=9+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=4194304-8   251MB/s ± 8%   261MB/s ± 3%  +4.04%  (p=0.004 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=2048-8        480MB/s ± 2%   489MB/s ± 3%  +1.90%  (p=0.046 n=8+9)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=262144-8      803MB/s ± 9%   826MB/s ± 6%    ~     (p=0.211 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=4194304-8     943MB/s ± 5%   957MB/s ± 2%    ~     (p=0.156 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=2048-8       449MB/s ± 1%   462MB/s ± 1%  +2.79%  (p=0.000 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=262144-8     616MB/s ± 2%   622MB/s ± 1%  +1.00%  (p=0.010 n=10+9)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=4194304-8    248MB/s ± 1%   247MB/s ± 2%    ~     (p=0.905 n=9+10)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=2048-8        387MB/s ± 1%   391MB/s ± 2%  +1.09%  (p=0.027 n=9+8)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=262144-8      580MB/s ± 2%   586MB/s ± 1%  +1.10%  (p=0.043 n=10+8)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=4194304-8     636MB/s ± 1%   648MB/s ± 1%  +2.00%  (p=0.000 n=8+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=2048-8       367MB/s ± 3%   381MB/s ± 2%  +3.75%  (p=0.000 n=8+9)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=262144-8     473MB/s ± 6%   480MB/s ± 1%  +1.49%  (p=0.035 n=9+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=4194304-8    217MB/s ± 5%   221MB/s ± 1%  +1.67%  (p=0.008 n=9+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=2048-8         375MB/s ± 1%   378MB/s ± 1%  +0.89%  (p=0.000 n=8+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=262144-8       558MB/s ± 1%   543MB/s ± 2%  -2.76%  (p=0.000 n=9+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=4194304-8      600MB/s ± 2%   608MB/s ± 2%    ~     (p=0.089 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=2048-8        358MB/s ± 1%   364MB/s ± 3%  +1.64%  (p=0.006 n=10+9)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=262144-8      454MB/s ± 2%   453MB/s ± 2%    ~     (p=0.549 n=9+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=4194304-8     207MB/s ± 1%   208MB/s ± 3%    ~     (p=0.393 n=10+10)

name                                                                  old alloc/op   new alloc/op   delta
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=2048-8         872kB ± 0%     872kB ± 0%  +0.01%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=262144-8      49.9MB ± 0%    49.9MB ± 0%    ~     (p=0.183 n=9+9)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=4194304-8      739MB ± 0%     739MB ± 0%  +0.00%  (p=0.009 n=9+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=2048-8        893kB ± 0%     893kB ± 0%  +0.01%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=262144-8     52.3MB ± 0%    52.3MB ± 0%    ~     (p=0.165 n=10+8)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=4194304-8     777MB ± 0%     777MB ± 0%    ~     (p=0.106 n=9+8)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=2048-8          875kB ± 0%     875kB ± 0%  +0.01%  (p=0.000 n=9+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=262144-8       50.1MB ± 0%    50.1MB ± 0%  +0.00%  (p=0.000 n=8+9)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=4194304-8       743MB ± 0%     743MB ± 0%    ~     (p=0.652 n=9+8)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=2048-8         896kB ± 0%     896kB ± 0%  +0.01%  (p=0.000 n=7+9)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=262144-8      52.5MB ± 0%    52.5MB ± 0%  +0.00%  (p=0.000 n=8+8)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=4194304-8      781MB ± 0%     781MB ± 0%  +0.00%  (p=0.021 n=9+9)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=2048-8          872kB ± 0%     872kB ± 0%  +0.01%  (p=0.000 n=8+9)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=262144-8       49.9MB ± 0%    49.9MB ± 0%    ~     (p=0.164 n=10+8)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=4194304-8       739MB ± 0%     739MB ± 0%  +0.00%  (p=0.000 n=9+8)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=2048-8         893kB ± 0%     893kB ± 0%  +0.01%  (p=0.000 n=10+8)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=262144-8      52.3MB ± 0%    52.3MB ± 0%  +0.00%  (p=0.000 n=9+9)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=4194304-8      777MB ± 0%     777MB ± 0%  +0.00%  (p=0.013 n=8+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=2048-8           875kB ± 0%     875kB ± 0%  +0.01%  (p=0.000 n=9+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=262144-8        50.1MB ± 0%    50.1MB ± 0%  +0.00%  (p=0.000 n=10+8)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=4194304-8        743MB ± 0%     743MB ± 0%    ~     (p=0.098 n=10+9)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=2048-8          896kB ± 0%     896kB ± 0%  +0.01%  (p=0.001 n=8+6)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=262144-8       52.5MB ± 0%    52.5MB ± 0%  +0.00%  (p=0.000 n=9+9)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=4194304-8       781MB ± 0%     781MB ± 0%  +0.00%  (p=0.025 n=9+9)

name                                                                  old allocs/op  new allocs/op  delta
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=2048-8           156 ± 0%       157 ± 0%  +0.64%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=262144-8       1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=4194304-8      16.8k ± 0%     16.8k ± 0%  +0.01%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=2048-8          158 ± 0%       159 ± 0%  +0.63%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=262144-8      1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=4194304-8     16.8k ± 0%     16.8k ± 0%    ~     (p=0.137 n=10+8)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=2048-8            158 ± 0%       159 ± 0%  +0.63%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=262144-8        1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=4194304-8       16.8k ± 0%     16.8k ± 0%  +0.00%  (p=0.015 n=10+9)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=2048-8           160 ± 0%       161 ± 0%  +0.63%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=262144-8       1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=4194304-8      16.8k ± 0%     16.8k ± 0%  +0.01%  (p=0.003 n=10+9)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=2048-8            156 ± 0%       157 ± 0%  +0.64%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=262144-8        1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=4194304-8       16.8k ± 0%     16.8k ± 0%  +0.01%  (p=0.000 n=9+8)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=2048-8           158 ± 0%       159 ± 0%  +0.63%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=262144-8       1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=4194304-8      16.8k ± 0%     16.8k ± 0%  +0.01%  (p=0.001 n=9+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=2048-8             158 ± 0%       159 ± 0%  +0.63%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=262144-8         1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=4194304-8        16.8k ± 0%     16.8k ± 0%  +0.00%  (p=0.023 n=10+8)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=2048-8            160 ± 0%       161 ± 0%  +0.63%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=262144-8        1.31k ± 0%     1.31k ± 0%  +0.08%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=4194304-8       16.8k ± 0%     16.8k ± 0%  +0.00%  (p=0.007 n=10+9)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@Azhng Azhng force-pushed the azhng-distinct-ht branch from 277f414 to 2e3e462 Compare March 4, 2020 23:29
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added nulls into benchmark,

New Benchmark

benchstat ~/.local/tmp/master.nulls.c10.benchmark ~/.local/tmp/dht.nulls.c10.benchmark
name                                                                                       old time/op    new time/op    delta
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8      294µs ± 1%     286µs ± 1%    -2.92%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8      419µs ± 1%     362µs ± 1%   -13.70%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8    3.50ms ± 2%    1.91ms ± 1%   -45.28%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8    5.62ms ± 2%    2.84ms ± 3%   -49.44%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8      291µs ± 1%     290µs ± 1%      ~     (p=0.387 n=9+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8      419µs ± 1%     364µs ± 0%   -13.11%  (p=0.000 n=10+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8    4.11ms ± 2%    1.96ms ± 1%   -52.44%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8    6.58ms ± 3%    2.84ms ± 4%   -56.90%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8      309µs ± 4%     313µs ± 0%    +0.99%  (p=0.021 n=10+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8      442µs ± 4%     393µs ± 1%   -11.18%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8    4.34ms ± 2%    2.34ms ± 0%   -46.06%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8    6.94ms ± 3%    3.42ms ± 1%   -50.65%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8       403µs ± 2%     379µs ± 2%    -5.79%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8       628µs ± 1%     528µs ± 4%   -15.91%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8     5.11ms ± 0%    3.06ms ± 1%   -40.07%  (p=0.000 n=8+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8     9.16ms ± 2%    5.04ms ± 1%   -44.94%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8       407µs ± 2%     380µs ± 1%    -6.56%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8       643µs ± 2%     520µs ± 1%   -19.06%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8     5.93ms ± 0%    3.09ms ± 2%   -47.86%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8     12.3ms ± 1%     5.1ms ± 1%   -58.26%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8       417µs ± 4%     402µs ± 2%    -3.54%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8       647µs ± 3%     572µs ± 4%   -11.56%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8     6.36ms ± 4%    3.75ms ± 4%   -40.99%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8     11.2ms ± 3%     5.8ms ± 1%   -48.67%  (p=0.000 n=10+9)

name                                                                                       old speed      new speed      delta
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8    222MB/s ± 3%   229MB/s ± 1%    +3.40%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8    313MB/s ± 1%   362MB/s ± 1%   +15.88%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8   300MB/s ± 2%   548MB/s ± 1%   +82.74%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8   373MB/s ± 2%   738MB/s ± 3%   +97.79%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8    225MB/s ± 1%   226MB/s ± 1%      ~     (p=0.374 n=9+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8    313MB/s ± 1%   360MB/s ± 0%   +15.09%  (p=0.000 n=10+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8   255MB/s ± 2%   536MB/s ± 1%  +110.25%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8   319MB/s ± 3%   740MB/s ± 4%  +132.02%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8    212MB/s ± 4%   210MB/s ± 0%    -1.00%  (p=0.019 n=10+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8    296MB/s ± 4%   334MB/s ± 1%   +12.55%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8   242MB/s ± 2%   448MB/s ± 0%   +85.36%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8   302MB/s ± 3%   613MB/s ± 1%  +102.62%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8     163MB/s ± 2%   173MB/s ± 2%    +6.15%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8     209MB/s ± 1%   248MB/s ± 4%   +18.97%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8    205MB/s ± 0%   343MB/s ± 1%   +66.85%  (p=0.000 n=8+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8    229MB/s ± 2%   416MB/s ± 1%   +81.61%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8     161MB/s ± 2%   172MB/s ± 1%    +7.00%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8     204MB/s ± 2%   252MB/s ± 1%   +23.53%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8    177MB/s ± 0%   339MB/s ± 2%   +91.80%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8    171MB/s ± 1%   410MB/s ± 1%  +139.56%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8     157MB/s ± 4%   163MB/s ± 2%    +3.64%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8     203MB/s ± 3%   229MB/s ± 4%   +13.08%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8    165MB/s ± 4%   279MB/s ± 4%   +69.46%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8    187MB/s ± 3%   364MB/s ± 1%   +94.75%  (p=0.000 n=10+9)

name                                                                                       old alloc/op   new alloc/op   delta
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8      932kB ± 0%    1164kB ± 0%   +24.92%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8     1.16MB ± 0%    1.20MB ± 0%    +3.00%  (p=0.000 n=10+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8    8.17MB ± 0%    1.18MB ± 0%   -85.60%  (p=0.000 n=8+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8    14.0MB ± 0%     1.2MB ± 0%   -91.34%  (p=0.000 n=9+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8      932kB ± 0%    1166kB ± 0%   +25.13%  (p=0.001 n=8+6)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8     1.16MB ± 0%    1.20MB ± 0%    +3.33%  (p=0.000 n=8+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8    8.17MB ± 0%    1.20MB ± 0%   -85.29%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8    14.0MB ± 0%     1.2MB ± 0%   -91.16%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8      932kB ± 0%    1180kB ± 0%   +26.69%  (p=0.000 n=8+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8     1.16MB ± 0%    1.23MB ± 0%    +5.83%  (p=0.000 n=6+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8    8.17MB ± 0%    1.77MB ± 0%   -78.27%  (p=0.000 n=8+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8    14.0MB ± 0%     2.2MB ± 0%   -83.99%  (p=0.000 n=10+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8       932kB ± 0%    1164kB ± 0%   +24.96%  (p=0.000 n=8+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8      1.16MB ± 0%    1.20MB ± 0%    +3.16%  (p=0.000 n=8+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8     8.17MB ± 0%    1.18MB ± 0%   -85.60%  (p=0.000 n=8+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8     14.0MB ± 0%     1.2MB ± 0%   -91.34%  (p=0.000 n=8+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8       932kB ± 0%    1166kB ± 0%   +25.13%  (p=0.000 n=9+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8      1.16MB ± 0%    1.20MB ± 0%    +3.33%  (p=0.000 n=9+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8     8.17MB ± 0%    1.20MB ± 0%   -85.29%  (p=0.000 n=8+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8     14.0MB ± 0%     1.3MB ± 0%   -90.93%  (p=0.000 n=9+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8       932kB ± 0%    1180kB ± 0%   +26.69%  (p=0.000 n=8+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8      1.16MB ± 0%    1.23MB ± 0%    +5.83%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8     8.17MB ± 0%    2.00MB ± 0%   -75.56%  (p=0.000 n=8+8)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8     14.0MB ± 0%     2.2MB ± 0%   -83.99%  (p=0.000 n=10+9)

name                                                                                       old allocs/op  new allocs/op  delta
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8        102 ± 0%        72 ± 0%   -29.41%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8        168 ± 0%       108 ± 0%   -35.71%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8       622 ± 0%       208 ± 0%   -66.56%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8     1.21k ± 0%     0.37k ± 0%   -69.21%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8        102 ± 0%        80 ± 0%   -21.57%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8        168 ± 0%       124 ± 0%   -26.19%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8       622 ± 0%       216 ± 0%   -65.27%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8     1.21k ± 0%     0.39k ± 0%   -67.88%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8        102 ± 0%        90 ± 0%   -11.76%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8        168 ± 0%       144 ± 0%   -14.29%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8       622 ± 0%       239 ± 0%   -61.58%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8     1.21k ± 0%     0.44k ± 0%   -63.82%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-8         102 ± 0%        76 ± 0%   -25.49%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-8         168 ± 0%       120 ± 0%   -28.57%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-8        622 ± 0%       208 ± 0%   -66.56%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-8      1.21k ± 0%     0.37k ± 0%   -69.21%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-8         102 ± 0%        80 ± 0%   -21.57%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-8         168 ± 0%       124 ± 0%   -26.19%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-8        622 ± 0%       216 ± 0%   -65.27%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-8      1.21k ± 0%     0.40k ± 0%   -67.22%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-8         102 ± 0%        90 ± 0%   -11.76%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-8         168 ± 0%       144 ± 0%   -14.29%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-8        622 ± 0%       242 ± 0%   -61.09%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-8      1.21k ± 0%     0.44k ± 0%   -63.82%  (p=0.000 n=10+10)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! I have a bunch of nits and some comments, but nothing major.

nit: for the benchmarks, you could probably choose only 1 out of 4 "sheets" of information (I prefer looking at the second one, about the speed in MB/s), this will make it easier to digest. I'd also be interested to see the number for partiallyOrderedDistinct since that operator reuses this unorderedDistinct.

Reviewed 7 of 7 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/colexec/hashtable.go, line 29 at r1 (raw file):

const (
	// hashTableFullMode is the mode where hashTable buffer all input tuples and

nit: s/buffer all/buffers all/g and s/populate/populates/g.


pkg/sql/colexec/hashtable.go, line 33 at r1 (raw file):

	hashTableFullMode hashTableMode = iota

	// hashTableDistinctMode is the mode where hashTable only buffer distinct

nit: s/buffer distinct/buffers only distinct/g. Also, I'd omit the second sentence of the comment since it seems too specific to the use case of unorderedDistinct and probably adds the confusion.


pkg/sql/colexec/hashtable.go, line 90 at r1 (raw file):

	toCheck []uint64

	// hashBuffer stores the hash values of each tuple in the probe table.

nit: you could extend this comment to highlight that hashBuffer will be dynamically updated when we're looking for distinct tuples on the probe side in "distinct" mode.


pkg/sql/colexec/hashtable.go, line 94 at r1 (raw file):

	// keyCols stores the indices of the input batch which are key columns.
	keyCols []uint32

This slice seems to always be {0, 1, 2, ..., nKeys - 1}, can we eliminate it?


pkg/sql/colexec/hashtable.go, line 258 at r1 (raw file):

		ht.probeScratch.keyCols = probeKeyCols
		ht.probeScratch.first = make([]uint64, numBuckets)
		ht.probeScratch.next = make([]uint64, coldata.BatchSize()+1)

Why do we need plus ones here? I feel like for most of these slices we don't need it, and, instead, in the code we could have keyID - 1 or whatever when accessing them. It'd be nice to unify it with non-distinct mode.


pkg/sql/colexec/hashtable.go, line 289 at r1 (raw file):

		// ht.next is used to store the computed hash value of each key.
		ht.buildScratch.next = make([]uint64, ht.vals.Length()+1)

We should keep maybeAllocateUint64Array call here instead of using make directly.


pkg/sql/colexec/hashtable.go, line 341 at r1 (raw file):

// and length.
// NOTE: the hashTable *must* have been already built.
func (ht *hashTable) removeDuplicatesInBufferedBatch(batch coldata.Batch) {

It seems like these two removeDuplicates* functions are quite similar, could we refactor the code to support both with a single function?


pkg/sql/colexec/hashtable.go, line 358 at r1 (raw file):

		// Continue searching for the build table matching keys while the toCheck
		// array is non-empty.
		nToCheck =

nit: seems like an unnecessary line break.


pkg/sql/colexec/hashtable_tmpl.go, line 86 at r1 (raw file):

			// found.

			// {{if .UseProbeSel}}

nit: UseProbeSel and UseBuildSel probably should be added as arguments to the template function (similar to _SELECT_DISTINCT) to stay consistent.


pkg/sql/colexec/hashtable_tmpl.go, line 135 at r1 (raw file):

		// {{if .SelectDistinct}}
		if keyID == 0 {

nit: you could rearrange the code using if statement:

if keyID != 0 {
...
}
// {{if .SelectDistinct}}
else {
   ht.probeScratch.distinct[toCheck] = true
}
// {{end}}

pkg/sql/colexec/hashtable_tmpl.go, line 205 at r1 (raw file):

			if probeSel != nil {
				if buildSel != nil {
					_CHECK_COL_WITH_NULLS(

nit: I'd keep these templated function calls as one long line, similar to all others like it in this file.


pkg/sql/colexec/hashtable_tmpl.go, line 284 at r1 (raw file):

	switch probeType {
	// {{range $lTyp, $rTypToOverload := .}}
	// {{with $overload := index $rTypToOverload $lTyp}}

Oh, this looks fancy. What does it do?


pkg/sql/colexec/hashtable_tmpl.go, line 349 at r1 (raw file):

} // */}}

// findDistinctTuples find all tuples in probeVecs that are not present in

nit: s/find all/finds all/g.


pkg/sql/colexec/hashtable_tmpl.go, line 353 at r1 (raw file):

// keyIDs in headID buffer.
// NOTE: It assumes that probeVecs does not contain any duplicates itself.
// NOTE: It assumes that probSel has already being populated and it is not

We should probably enforce this assumption as the first line in the function.


pkg/sql/colexec/hashtable_tmpl.go, line 428 at r1 (raw file):

			visited[ht.probeScratch.headID[i]] = true

			// Compacting and dedup hash buffer.

nit: s/dedup/deduplicating/g.


pkg/sql/colexec/hashtable_tmpl.go, line 436 at r1 (raw file):

	}

	copy(visited, zeroBoolColumn)

Shouldn't we do this zero out before we use this buffer? I think usually the contract is that before you're reusing some slice, you make sure that it is in the state you expect it to be, and in this case we seem to expect visited to consist of only false values before the loop.


pkg/sql/colexec/hashtable_tmpl.go, line 442 at r1 (raw file):

} // */}}

// updateSel updates the selection vector in the given batch using the headID

This comment should be extended. It's not clear what this function's purpose is and what it does.


pkg/sql/colexec/hashtable_tmpl.go, line 458 at r1 (raw file):

}

// distinctCheck determines if the current key in the groupID buckets matches the

nit: s/groupID buckets/groupID bucket/g.


pkg/sql/colexec/partially_ordered_distinct.go, line 124 at r1 (raw file):

		input:         input,
		inputTypes:    inputTypes,
		windowedBatch: allocator.NewMemBatch(inputTypes),

Why is this change?


pkg/sql/colexec/unordered_distinct.go, line 67 at r1 (raw file):

	buildFinished bool

	// sel is a list of indices to select representing the distinct rows.

nit: the comment should be removed as well.


pkg/sql/colexec/unordered_distinct.go, line 89 at r1 (raw file):

	}

	// Since hashTable only buffers distinct tuple when it is built in distinct

nit: sounds like an incomplete sentence. Probably something like "We're using the hashTable in distinct mode, so it buffers only distinct tuples, as a result, we will be simply returning all buffered tuples."

Also, this assignment should be moved into the if block above.


pkg/sql/colexec/unordered_distinct.go, line 94 at r1 (raw file):

	// Create and return the next batch of input to a maximum size of
	// coldata.BatchSize(). The rows in the new batch are specified by the

nit: second sentence of the comment should be deleted.


pkg/sql/colexec/execgen/cmd/execgen/hashtable_gen.go, line 44 at r1 (raw file):

		`{{template "checkColBody" buildDict "Global" .Global "UseProbeSel" .UseProbeSel "UseBuildSel" .UseBuildSel "ProbeHasNulls" $7 "BuildHasNulls" $8 "AllowNullEquality" $9 "SelectDistinct" $10}}`)

	checkColWithNull := makeFunctionRegex("_CHECK_COL_WITH_NULLS", 8)

nit: s/checkColWithNull/checkColWithNulls/g.


pkg/sql/colexec/execgen/cmd/execgen/hashtable_gen.go, line 48 at r1 (raw file):

		`{{template "checkColWithNulls" buildDict "Global" . "UseProbeSel" $7 "UseBuildSel" $8}}`)

	checkColForDistinctWithNull := makeFunctionRegex("_CHECK_COL_FOR_DISTINCT_WITH_NULLS", 8)

ditto (missing s).

@Azhng Azhng force-pushed the azhng-distinct-ht branch from 2e3e462 to 17c089e Compare March 5, 2020 20:40
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the benchmark for partiallyOrderedDistinct.
Haven't got a chance to run it on GCE worker yet. Will post the result once I have it set up.

name                                                                                              old speed      new speed       delta
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=1-8    383MB/s ± 1%    420MB/s ± 9%   +9.73%  (p=0.001 n=8+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=2-8    525MB/s ± 1%    562MB/s ± 1%   +7.21%  (p=0.000 n=8+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=1-8   503MB/s ± 1%    624MB/s ± 1%  +24.17%  (p=0.000 n=9+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=2-8   684MB/s ± 1%    869MB/s ± 1%  +27.06%  (p=0.000 n=9+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=1-8    283MB/s ± 2%    325MB/s ± 2%  +14.85%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=2-8    447MB/s ± 1%    408MB/s ± 1%   -8.65%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=1-8   281MB/s ± 1%    321MB/s ± 2%  +14.10%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=2-8   539MB/s ± 0%    612MB/s ± 2%  +13.55%  (p=0.000 n=9+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=1-8    100MB/s ± 1%    109MB/s ± 2%   +9.35%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=2-8    134MB/s ± 2%    124MB/s ± 1%   -7.64%  (p=0.000 n=9+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=1-8   111MB/s ± 1%    115MB/s ± 2%   +3.98%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=2-8   121MB/s ± 1%    133MB/s ± 1%   +9.46%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=1-8     314MB/s ± 0%    311MB/s ± 1%   -0.92%  (p=0.000 n=8+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=2-8     406MB/s ± 1%    452MB/s ± 2%  +11.39%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=1-8    387MB/s ± 2%    402MB/s ± 5%     ~     (p=0.065 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=2-8    499MB/s ± 1%    474MB/s ± 1%   -4.94%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=1-8     269MB/s ± 1%    241MB/s ± 1%  -10.58%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=2-8     275MB/s ± 1%    345MB/s ± 1%  +25.65%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=1-8    262MB/s ± 1%    262MB/s ± 1%     ~     (p=0.968 n=9+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=2-8    346MB/s ± 1%    378MB/s ± 1%   +9.24%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=1-8    92.3MB/s ± 1%  103.3MB/s ± 1%  +11.98%  (p=0.000 n=8+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=2-8     119MB/s ± 1%    110MB/s ± 1%   -7.45%  (p=0.000 n=9+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=1-8   90.2MB/s ± 1%   94.2MB/s ± 1%   +4.38%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=2-8    116MB/s ± 2%    122MB/s ± 2%   +4.77%  (p=0.000 n=9+10)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/hashtable.go, line 94 at r1 (raw file):

Previously, yuzefovich wrote…

This slice seems to always be {0, 1, 2, ..., nKeys - 1}, can we eliminate it?

Done.


pkg/sql/colexec/hashtable.go, line 258 at r1 (raw file):

Previously, yuzefovich wrote…

Why do we need plus ones here? I feel like for most of these slices we don't need it, and, instead, in the code we could have keyID - 1 or whatever when accessing them. It'd be nice to unify it with non-distinct mode.

Updated. Only the next slices would require to have plus ones.

Done


pkg/sql/colexec/hashtable.go, line 289 at r1 (raw file):

Previously, yuzefovich wrote…

We should keep maybeAllocateUint64Array call here instead of using make directly.

Done.

My bad, probably slipped through when I was trying resolving the merge conflict.


pkg/sql/colexec/hashtable.go, line 341 at r1 (raw file):

Previously, yuzefovich wrote…

It seems like these two removeDuplicates* functions are quite similar, could we refactor the code to support both with a single function?

Done.


pkg/sql/colexec/hashtable_tmpl.go, line 135 at r1 (raw file):

Previously, yuzefovich wrote…

nit: you could rearrange the code using if statement:

if keyID != 0 {
...
}
// {{if .SelectDistinct}}
else {
   ht.probeScratch.distinct[toCheck] = true
}
// {{end}}

Turns out go compiler enforce that else must follow immediately after }, otherwise it's a syntax error and gofmt can't even fix it.


pkg/sql/colexec/hashtable_tmpl.go, line 284 at r1 (raw file):

Previously, yuzefovich wrote…

Oh, this looks fancy. What does it do?

This is because the overloads we pass in to the template is a map[coltypes.T]map[coltypes.T]*overloads. This is so that we can generate comparison codes for comparable but not identical types.
But in this function since we know we are going to be appending the probe table into build table they column types will have to be the same, so we only need to generate comparison code for identical types.

So in this case . is the .Global object (the two level map) and semantically this means:

for lTyp, rTypeToOverload := range .Global {
    overload := rTypeOverload[lTyp]
    ...
}

I could have just modify checkCol function and reuse it but I didn't want to add another boolean flag into that function's call path. This kinda make the code a little bit more duplicated but I feel it makes each individual function simpler.


pkg/sql/colexec/hashtable_tmpl.go, line 353 at r1 (raw file):

Previously, yuzefovich wrote…

We should probably enforce this assumption as the first line in the function.

Done.


pkg/sql/colexec/hashtable_tmpl.go, line 436 at r1 (raw file):

Previously, yuzefovich wrote…

Shouldn't we do this zero out before we use this buffer? I think usually the contract is that before you're reusing some slice, you make sure that it is in the state you expect it to be, and in this case we seem to expect visited to consist of only false values before the loop.

Done.

Zero out visited before and after the function. Since the underneath storage is also used elsewhere.


pkg/sql/colexec/hashtable_tmpl.go, line 442 at r1 (raw file):

Previously, yuzefovich wrote…

This comment should be extended. It's not clear what this function's purpose is and what it does.

Done.


pkg/sql/colexec/partially_ordered_distinct.go, line 124 at r1 (raw file):

Previously, yuzefovich wrote…

Why is this change?

Because when creating batches with NewMemBatchWithSize(inputType, 0), it causes the selection vector of that batch to be allocated with 0 capacity and length.
In the hash table, a lot of places assumes that once batch.SetSelection(true) is called, next time when we call batch.Selection() it will return a selection vector with length equal to coldata.BatchSize().

Correct me if my assumption is wrong.


pkg/sql/colexec/execgen/cmd/execgen/hashtable_gen.go, line 48 at r1 (raw file):

Previously, yuzefovich wrote…

ditto (missing s).

Done.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/colexec/hashtable.go, line 341 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

Nice!


pkg/sql/colexec/hashtable.go, line 33 at r3 (raw file):

	hashTableFullMode hashTableMode = iota

	// hashTableDistinctMode is the mode where hashTable only buffers only

nit: double "only."


pkg/sql/colexec/hashtable.go, line 341 at r3 (raw file):

// a slice of key columns of the probe table, number of tuple to check and the
// selection vector of the probe table and returns number of tuples that needs
// to be checked for next iteration. It populate the ht.probeScratch.headID to

nit: s/populate/populates/g.


pkg/sql/colexec/hashtable.go, line 342 at r3 (raw file):

// selection vector of the probe table and returns number of tuples that needs
// to be checked for next iteration. It populate the ht.probeScratch.headID to
// point to the keyIDs that need be included in probe table's selection vector.

nit: s/need be/need to be/g.


pkg/sql/colexec/hashtable_tmpl.go, line 135 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Turns out go compiler enforce that else must follow immediately after }, otherwise it's a syntax error and gofmt can't even fix it.

I see, TIL.


pkg/sql/colexec/hashtable_tmpl.go, line 284 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

This is because the overloads we pass in to the template is a map[coltypes.T]map[coltypes.T]*overloads. This is so that we can generate comparison codes for comparable but not identical types.
But in this function since we know we are going to be appending the probe table into build table they column types will have to be the same, so we only need to generate comparison code for identical types.

So in this case . is the .Global object (the two level map) and semantically this means:

for lTyp, rTypeToOverload := range .Global {
    overload := rTypeOverload[lTyp]
    ...
}

I could have just modify checkCol function and reuse it but I didn't want to add another boolean flag into that function's call path. This kinda make the code a little bit more duplicated but I feel it makes each individual function simpler.

I haven't seen use of index directive before, thanks for the explanation, this makes sense. It'd be definitely useful to include the code snipped you posted as the templated comment into the code (i.e. surrounding it with //{{/* and //*/}}).


pkg/sql/colexec/hashtable_tmpl.go, line 436 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

Zero out visited before and after the function. Since the underneath storage is also used elsewhere.

I think the "after" zero out should be moved closer to where it is used elsewhere.


pkg/sql/colexec/partially_ordered_distinct.go, line 124 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Because when creating batches with NewMemBatchWithSize(inputType, 0), it causes the selection vector of that batch to be allocated with 0 capacity and length.
In the hash table, a lot of places assumes that once batch.SetSelection(true) is called, next time when we call batch.Selection() it will return a selection vector with length equal to coldata.BatchSize().

Correct me if my assumption is wrong.

I see. We have NewMemBatchNoCols for such cases, can you try it out?


pkg/sql/colexec/unordered_distinct.go, line 88 at r3 (raw file):

		// We're using the hashTable in distinct mode, so it buffers only distinct
		// tuples, as a result, we will be simply returning all buffered tuples

nit: missing period.

Previously hashTable will buffer all tuples before building `head` and
`same` vector.
Now hashTable supports distinct mode where it only buffers
the distinct tuples from upstream operator. This removes the need of traversing
through the `head` and `same` vectors. Instead, now user of the hashTable
can directly build hashTable in distinct mode and copy the buffered tuples.

Release note: None
@Azhng Azhng force-pushed the azhng-distinct-ht branch from 17c089e to fc0dac9 Compare March 5, 2020 21:28
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/hashtable_tmpl.go, line 284 at r1 (raw file):

Previously, yuzefovich wrote…

I haven't seen use of index directive before, thanks for the explanation, this makes sense. It'd be definitely useful to include the code snipped you posted as the templated comment into the code (i.e. surrounding it with //{{/* and //*/}}).

Done.


pkg/sql/colexec/partially_ordered_distinct.go, line 124 at r1 (raw file):

Previously, yuzefovich wrote…

I see. We have NewMemBatchNoCols for such cases, can you try it out?

Done.

Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rerun the benchmark on the GCE workers:

Unordered distinct:

name                                                                                        old speed      new speed      delta
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-24    154MB/s ± 5%   156MB/s ± 3%      ~     (p=0.460 n=10+8)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-24    219MB/s ± 3%   236MB/s ±13%      ~     (p=0.143 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-24   136MB/s ± 3%   384MB/s ± 2%  +181.65%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-24   245MB/s ± 1%   518MB/s ± 2%  +111.55%  (p=0.000 n=9+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-24    154MB/s ± 4%   155MB/s ± 3%      ~     (p=0.754 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-24    214MB/s ± 4%   249MB/s ± 2%   +16.29%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-24   181MB/s ± 3%   373MB/s ± 2%  +106.24%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-24   225MB/s ± 6%   504MB/s ± 2%  +123.53%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-24    150MB/s ± 4%   147MB/s ± 4%      ~     (p=0.143 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-24    207MB/s ± 6%   204MB/s ± 6%      ~     (p=0.182 n=10+9)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-24   165MB/s ± 2%   301MB/s ± 4%   +82.26%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-24   201MB/s ± 3%   398MB/s ± 2%   +97.83%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=0-24     113MB/s ± 4%   122MB/s ± 4%    +8.02%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=0-24     141MB/s ± 3%   168MB/s ± 2%   +19.43%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=0-24    138MB/s ± 4%   236MB/s ± 2%   +71.44%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=0-24    156MB/s ± 3%   282MB/s ± 1%   +80.65%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=0-24     109MB/s ± 5%   117MB/s ± 1%    +6.74%  (p=0.000 n=10+9)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=0-24     143MB/s ± 3%   164MB/s ± 2%   +14.59%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=0-24    100MB/s ± 3%   226MB/s ± 2%  +125.29%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=0-24    142MB/s ± 3%   272MB/s ± 3%   +91.53%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=0-24     108MB/s ± 3%   112MB/s ± 3%    +3.43%  (p=0.001 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=0-24     132MB/s ± 3%   150MB/s ± 8%   +13.62%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=0-24    107MB/s ± 5%   191MB/s ± 4%   +78.89%  (p=0.000 n=10+10)
Distinct/Unordered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=0-24    125MB/s ± 2%   237MB/s ± 4%   +89.61%  (p=0.000 n=10+10)

Partially ordered

name                                                                                               old speed      new speed      delta
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=2/ordCols=1-24    231MB/s ± 3%   306MB/s ± 3%  +32.19%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=4096/cols=4/ordCols=2-24    345MB/s ± 3%   418MB/s ± 1%  +21.01%  (p=0.000 n=10+8)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=2/ordCols=1-24   351MB/s ± 1%   444MB/s ± 2%  +26.69%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.001/rows=65536/cols=4/ordCols=2-24   486MB/s ± 1%   605MB/s ± 3%  +24.28%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=2/ordCols=1-24    182MB/s ± 4%   195MB/s ± 2%   +6.92%  (p=0.000 n=9+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=4096/cols=4/ordCols=2-24    317MB/s ± 1%   325MB/s ± 3%   +2.49%  (p=0.000 n=9+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=1-24   266MB/s ± 3%   233MB/s ± 2%  -12.54%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=4/ordCols=2-24   405MB/s ± 2%   388MB/s ± 3%   -3.99%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=2/ordCols=1-24   52.6MB/s ± 1%  72.5MB/s ± 2%  +37.79%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=4096/cols=4/ordCols=2-24   72.1MB/s ± 1%  82.0MB/s ± 2%  +13.81%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=2/ordCols=1-24  61.0MB/s ± 2%  74.9MB/s ± 2%  +22.74%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=false/newTupleProbability=0.100/rows=65536/cols=4/ordCols=2-24  77.3MB/s ± 2%  81.6MB/s ± 1%   +5.58%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=2/ordCols=1-24     196MB/s ± 1%   225MB/s ± 3%  +14.81%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=4096/cols=4/ordCols=2-24     256MB/s ± 1%   273MB/s ± 1%   +6.55%  (p=0.000 n=9+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=2/ordCols=1-24    246MB/s ± 2%   191MB/s ± 1%  -22.35%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.001/rows=65536/cols=4/ordCols=2-24    314MB/s ± 2%   432MB/s ± 2%  +37.66%  (p=0.000 n=10+9)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=2/ordCols=1-24     179MB/s ± 1%   160MB/s ± 4%  -10.54%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=4096/cols=4/ordCols=2-24     180MB/s ± 2%   202MB/s ± 2%  +12.28%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=2/ordCols=1-24    193MB/s ± 1%   175MB/s ± 1%   -9.37%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.010/rows=65536/cols=4/ordCols=2-24    309MB/s ± 2%   268MB/s ± 4%  -13.10%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=2/ordCols=1-24    53.6MB/s ± 1%  61.7MB/s ± 4%  +14.98%  (p=0.000 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=4096/cols=4/ordCols=2-24    72.6MB/s ± 2%  74.2MB/s ± 2%   +2.17%  (p=0.002 n=10+10)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=2/ordCols=1-24   58.4MB/s ± 2%  68.1MB/s ± 0%  +16.58%  (p=0.000 n=10+8)
Distinct/PartiallyOrdered/hasNulls=true/newTupleProbability=0.100/rows=65536/cols=4/ordCols=2-24   90.2MB/s ± 2%  91.8MB/s ± 3%   +1.82%  (p=0.011 n=10+10)

Hash join

name                                                                   old speed      new speed      delta
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=2048-24       315MB/s ±11%   367MB/s ± 3%  +16.50%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=262144-24     554MB/s ± 2%   582MB/s ± 2%   +5.07%  (p=0.000 n=10+9)
HashJoiner/nulls=false/fullOuter=false/distinct=true/rows=4194304-24    582MB/s ± 3%   619MB/s ± 3%   +6.38%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=2048-24      300MB/s ± 9%   321MB/s ± 7%   +6.77%  (p=0.007 n=10+10)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=262144-24    329MB/s ± 3%   350MB/s ± 4%   +6.50%  (p=0.000 n=10+9)
HashJoiner/nulls=false/fullOuter=false/distinct=false/rows=4194304-24   180MB/s ± 8%   201MB/s ± 5%  +11.32%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=2048-24        284MB/s ± 5%   354MB/s ± 6%  +24.45%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=262144-24      503MB/s ± 3%   562MB/s ± 3%  +11.65%  (p=0.000 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=true/rows=4194304-24     539MB/s ± 3%   570MB/s ± 1%   +5.84%  (p=0.000 n=10+8)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=2048-24       297MB/s ± 5%   302MB/s ±10%     ~     (p=0.912 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=262144-24     310MB/s ± 6%   339MB/s ± 9%   +9.27%  (p=0.002 n=10+10)
HashJoiner/nulls=false/fullOuter=true/distinct=false/rows=4194304-24    173MB/s ± 8%   183MB/s ± 2%   +5.57%  (p=0.003 n=10+9)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=2048-24        257MB/s ± 8%   273MB/s ± 4%   +5.92%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=262144-24      372MB/s ± 1%   392MB/s ± 1%   +5.41%  (p=0.000 n=9+7)
HashJoiner/nulls=true/fullOuter=false/distinct=true/rows=4194304-24     396MB/s ± 4%   405MB/s ± 4%   +2.38%  (p=0.014 n=10+9)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=2048-24       252MB/s ± 5%   252MB/s ± 2%     ~     (p=0.971 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=262144-24     262MB/s ± 6%   266MB/s ± 5%     ~     (p=0.353 n=10+10)
HashJoiner/nulls=true/fullOuter=false/distinct=false/rows=4194304-24    152MB/s ± 4%   153MB/s ± 4%     ~     (p=0.720 n=10+9)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=2048-24         234MB/s ± 3%   267MB/s ± 5%  +14.01%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=262144-24       365MB/s ± 2%   383MB/s ± 2%   +4.94%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=true/rows=4194304-24      373MB/s ± 4%   391MB/s ± 3%   +4.78%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=2048-24        231MB/s ±10%   252MB/s ± 3%   +8.89%  (p=0.003 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=262144-24      240MB/s ± 3%   259MB/s ± 4%   +7.69%  (p=0.000 n=10+10)
HashJoiner/nulls=true/fullOuter=true/distinct=false/rows=4194304-24     147MB/s ± 4%   150MB/s ± 4%     ~     (p=0.066 n=10+10)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

@mokaixu
Copy link
Copy Markdown
Contributor

mokaixu commented Mar 6, 2020

Great work Archer! 👍 @Azhng

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! :) :lgtm:

The numbers look very good! I think there might be something to investigate about the partially ordered distinct (maybe we don't need to zero out some slices when resetting), but that belongs in a separate PR.

Reviewed 4 of 4 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@Azhng
Copy link
Copy Markdown
Contributor Author

Azhng commented Mar 6, 2020

Sounds good. Will investigate later.

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 6, 2020

Merge conflict (retrying...)

craig bot pushed a commit that referenced this pull request Mar 6, 2020
45704: colexec: add distinct mode to hashTable r=Azhng a=Azhng

Previously hashTable will buffer all tuples before building `head` and
`same` vector.
Now hashTable supports distinct mode where it only buffers
the distinct tuples from upstream operator. This removes the need of traversing
through the `head` and `same` vectors. Instead, now user of the hashTable
can directly build hashTable in distinct mode and copy the buffered tuples.

Closes #44404

Release note: None

45727: cli: allow for `cockroach demo` to start in secure mode r=knz a=rohany

Fixes #45607.

Release note (cli change): This PR adds support for `cockroach demo`
to start in secure mode using the flag `--insecure=false`.

45759: storage: Do not create nil data pointers in rocksdb slices r=miretskiy a=miretskiy

Fixes #45524

Libroach's DBSlice data type gets converted to the rocksdb::Slice
whenever we need to access underlying rocksdb (iterators, get/set, etc).

However, internally, rocksdb::Slice asserts that the underlying data
pointer may not be null in the rocksdb::Slice::compare as observed in #45524

This change modifies our rocksdb bridge code to never generate
DBSlices with nullptr data.

Release notes: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
@knz knz merged commit 17d5c24 into cockroachdb:master Mar 6, 2020
@yuzefovich
Copy link
Copy Markdown
Member

@knz should we do "bors r-" to not confuse bors?

@knz
Copy link
Copy Markdown
Contributor

knz commented Mar 6, 2020

I don't know :) but i think it won't hurt.

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 6, 2020

Canceled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

colexec: improve unordered distinct

6 participants