expression: support least/greatest for int/real. by ywqzzy · Pull Request #3801 · pingcap/tiflash

ywqzzy · 2022-01-05T02:35:07Z

What problem does this PR solve?

Issue Number: close #3358

Problem Summary:

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Release note

Support push down LEAST/GREATEST for integers and float to TiFlash.

…umeric

ti-chi-bot · 2022-01-05T02:35:08Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

fuzhe1989
windtalker

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

ywqzzy · 2022-01-05T02:38:17Z

/cc @fuzhe1989 @XuHuaiyu @dragonly @SeaRise

ywqzzy · 2022-01-05T02:42:18Z

How to make a benchmark?

make bench_dbms -j16
cd build/dbms
./bench_dbms

The benchmark result:

Running ./bench_dbms
Run on (32 X 3393.62 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x1)
Load Average: 2.08, 2.05, 3.70
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
LeastBench/benchVec/iterations:100                  46317731 ns     46317127 ns          100
LeastBench/benchVecWithNullable/iterations:100      46645602 ns     46639713 ns          100
LeastBench/benchNormal/iterations:100              156318886 ns    156276623 ns          100
LeastBench/benchNormalWithNullable/iterations:100  121462177 ns    121433210 ns          100
LeastBench/benchVecMoreCols/iterations:100          29257609 ns     29257149 ns          100
LeastBench/benchNormalMoreCols/iterations:100      160558118 ns    160554176 ns          100

After I addressed the comments, The benchmark result is as follow:

Running ./bench_dbms
Run on (32 X 3393.63 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x1)
Load Average: 1.86, 1.79, 1.61
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
LeastBench/benchVec/iterations:100                  40676307 ns     40674935 ns          100
LeastBench/benchVecWithNullable/iterations:100      42957459 ns     42956277 ns          100
LeastBench/benchNormal/iterations:100              146060945 ns    146052833 ns          100
LeastBench/benchNormalWithNullable/iterations:100  129120457 ns    129118243 ns          100
LeastBench/benchVecMoreCols/iterations:100          28195435 ns     28195082 ns          100
LeastBench/benchNormalMoreCols/iterations:100      159592506 ns    159591600 ns          100

After I addressed the comments the second time, The benchmark result is as follow:

The row num is 10000.

The row num is 100000.

The row num is 1000000.

The row num is 10000000.

fuzhe1989 · 2022-01-05T04:58:00Z

dbms/src/Functions/LeastGreatest.h

+enum class LeastGreatest
+{
+    Least,
+    Greatest
+};


enum class LeastGreatest { Least, Greatest };

how about convert it to a trait class like:

struct LeastImpl { static constexpr auto name = "tidbLeast"; using ReturnType = xxx; }; struct GreatestImpl { static constexpr auto name = "tidbGreatest"; using ReturnType = xxx; };

I am not sure how to use the ReturnType. But I will change it according to the comments in this closed pr #3537

dbms/src/Functions/LeastGreatest.h

dbms/src/Functions/greatest.cpp

dbms/src/DataTypes/NumberTraits.h

dbms/src/Functions/LeastGreatest.h

dbms/src/Interpreters/castColumn.cpp

dbms/src/Functions/LeastGreatest.h

dbms/src/DataTypes/NumberTraits.h

fuzhe1989 · 2022-01-07T01:22:48Z

dbms/src/DataTypes/NumberTraits.h

+    using Type = std::conditional_t<
+        std::is_floating_point_v<A> || std::is_floating_point_v<B>,
+        Float64,
+        std::conditional_t<sizeof(A) == 8 && sizeof(A) == sizeof(B), typename BinaryLeastSpecialCase<A, B>::Type, Int64>>;


guess we don't need to check the size, but limit the types to integer/float is necessary.

Need it. If remove the size check,the test below won't pass, the output column will be UINT64, because tidb
don't cast every int to int64.

ASSERT_COLUMN_EQ( createColumn<Int64>({7, 2, 3, 3, 2}), executeFunction( func_name, createColumn<UInt16>({10, 2, 3, 4, 5}), createColumn<UInt16>({7, 6, 5, 3, 4}), createColumn<UInt16>({8, 9, 6, 3, 2})));

in TiDB it does return UInt64.

dbms/src/Functions/LeastGreatest.h

fuzhe1989 · 2022-01-07T05:01:42Z

dbms/src/Functions/LeastGreatest.h

@@ -188,15 +187,21 @@ class FunctionRowbasedLeastGreatest : public IFunction

        if (checkDataType<DataTypeInt64>(result_type.get()))


(may not be critical) another way to get the type is:

TypeIndex type_index = removeNullable(block.getByPosition(arguments[1]).type)->getTypeId(); switch (type_index) ...

However I'm not sure which way is better.

I will check it out

use castTypeToEither instead?
https://github.com/pingcap/tics/blob/0c9b631bc826c0b2ffed62b87f31a7d840f97018/dbms/src/Functions/FunctionsString.cpp#L4396-L4413
https://github.com/pingcap/tics/blob/0c9b631bc826c0b2ffed62b87f31a7d840f97018/dbms/src/Functions/FunctionsString.cpp#L4331

Maybe I can extract the type_index way to a function.
In function callOnBasicTypes(), I have found the similar logic

fuzhe1989 · 2022-01-11T03:08:39Z

dbms/src/DataTypes/NumberTraits.h

+    using Type = std::conditional_t<
+        std::is_floating_point_v<A> || std::is_floating_point_v<B>,
+        Float64,
+        std::conditional_t<sizeof(A) == 8 && sizeof(A) == sizeof(B), typename BinaryLeastSpecialCase<A, B>::Type, Int64>>;


in TiDB it does return UInt64.

windtalker · 2022-02-28T05:45:52Z

dbms/src/Functions/greatest.cpp

    static Result apply(A a, B b)
    {
-        return static_cast<Result>(a) > static_cast<Result>(b) ? static_cast<Result>(a) : static_cast<Result>(b);
+        return accurate::greaterOp(a, b) ? static_cast<Result>(a) : static_cast<Result>(b);


Why not cast a/b to the ResultType before comparision?

Signed-off-by: guo-shaoge <shaoge1994@163.com> Conflicts: dbms/CMakeLists.txt

guo-shaoge · 2022-03-02T11:27:49Z

dbms/src/Flash/Coprocessor/DAGUtils.cpp

-    {tipb::ScalarFuncSig::GreatestReal, "greatest"},
+    {tipb::ScalarFuncSig::GreatestInt, "tidbGreatest"},
+    {tipb::ScalarFuncSig::GreatestReal, "tidbGreatest"},
    {tipb::ScalarFuncSig::GreatestString, "greatest"},


Because greatest/least(bigint unsigned , bigint) will use Decimal. So we need to enable GreatestDecimal too.

Do you mean greatest/least(bigint unsigned , bigint) will be rewritten to greatest/least(cast(bigint unsigned as decimal(20,0)) , cast(bigint as decimal(20,0)) in TiDB?

guo-shaoge · 2022-03-02T11:28:41Z

dbms/src/Functions/greatest.cpp

 {
-    factory.registerFunction<FunctionGreatest>();
+    factory.registerFunction<FunctionTiDBGreatest>();
+    factory.registerFunction<FunctionBinaryGreatest>();


Maybe no need to register FunctionBinaryGreatest ?

guo-shaoge · 2022-03-02T12:07:28Z

dbms/src/Functions/least.cpp

-struct LeastBaseImpl<A, B, true>
+struct BinaryLeastBaseImpl<A, B, true>
 {
    using ResultType = If<std::is_floating_point_v<A> || std::is_floating_point_v<B>, double, Decimal32>;


Maybe Decimal32 is not enough in some situation?

Since greatest/least(decimal) is not supported yet, we can ignore this.

Signed-off-by: guo-shaoge <shaoge1994@163.com>

windtalker

LGTM

guo-shaoge · 2022-03-03T06:07:50Z

/run-check-issue-triage-complete

guo-shaoge · 2022-03-03T06:08:09Z

/run-integration-test
/run-unit-test

sre-bot · 2022-03-03T06:24:55Z

Coverage for changed files

Filename                                     Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataTypes/NumberTraits.h                           3                 3     0.00%           3                 3     0.00%           9                 9     0.00%           0                 0         -
Flash/Coprocessor/DAGUtils.cpp                   285               228    20.00%          35                23    34.29%         491               390    20.57%         320               226    29.38%
Functions/FunctionBinaryArithmetic.h             569               203    64.32%          40                 8    80.00%        1193               288    75.86%         274               103    62.41%
Functions/FunctionsStringSearch.cpp              645               333    48.37%          56                29    48.21%        1312               670    48.93%         410               215    47.56%
Functions/IFunction.h                             69                34    50.72%          60                31    48.33%          94                50    46.81%           6                 2    66.67%
Functions/LeastGreatest.h                         25                 6    76.00%          10                 4    60.00%          50                12    76.00%          10                 2    80.00%
Functions/greatest.cpp                             5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/least.cpp                                5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/tests/gtest_least_greatest.cpp         743               149    79.95%           2                 0   100.00%         262                 0   100.00%         230               117    49.13%
Interpreters/castColumn.cpp                        6                 2    66.67%           3                 1    66.67%          23                 4    82.61%           2                 1    50.00%
TestUtils/FunctionTestUtils.h                    113                 9    92.04%          33                 0   100.00%         264                 7    97.35%          48                 6    87.50%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                           2468               967    60.82%         246                99    59.76%        3714              1430    61.50%        1304               672    48.47%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16727      9483             43.31%    186998  95657        48.85%

full coverage report (for internal network access only)

windtalker · 2022-03-03T06:54:49Z

/merge

ti-chi-bot · 2022-03-03T06:54:50Z

@windtalker: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-03-03T06:54:55Z

This pull request has been accepted and is ready to merge.

Details

Commit hash: 06ea3c2

sre-bot · 2022-03-03T07:21:02Z

Coverage for changed files

Filename                                     Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataTypes/NumberTraits.h                           3                 3     0.00%           3                 3     0.00%           9                 9     0.00%           0                 0         -
Flash/Coprocessor/DAGUtils.cpp                   285               228    20.00%          35                23    34.29%         491               390    20.57%         320               226    29.38%
Functions/FunctionBinaryArithmetic.h             569               203    64.32%          40                 8    80.00%        1193               288    75.86%         274               103    62.41%
Functions/FunctionsStringSearch.cpp              645               333    48.37%          56                29    48.21%        1312               670    48.93%         410               215    47.56%
Functions/IFunction.h                             69                34    50.72%          60                31    48.33%          94                50    46.81%           6                 2    66.67%
Functions/LeastGreatest.h                         25                 6    76.00%          10                 4    60.00%          50                12    76.00%          10                 2    80.00%
Functions/greatest.cpp                             5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/least.cpp                                5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/tests/gtest_least_greatest.cpp         743               149    79.95%           2                 0   100.00%         262                 0   100.00%         230               117    49.13%
Interpreters/castColumn.cpp                        6                 2    66.67%           3                 1    66.67%          23                 4    82.61%           2                 1    50.00%
TestUtils/FunctionTestUtils.h                    113                 9    92.04%          33                 0   100.00%         264                 7    97.35%          48                 6    87.50%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                           2468               967    60.82%         246                99    59.76%        3714              1430    61.50%        1304               672    48.47%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16701      9455             43.39%    187154  95545        48.95%

full coverage report (for internal network access only)

sre-bot · 2022-03-03T08:07:13Z

Coverage for changed files

Filename                                     Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataTypes/NumberTraits.h                           3                 3     0.00%           3                 3     0.00%           9                 9     0.00%           0                 0         -
Flash/Coprocessor/DAGUtils.cpp                   285               228    20.00%          35                23    34.29%         491               390    20.57%         320               226    29.38%
Functions/FunctionBinaryArithmetic.h             569               203    64.32%          40                 8    80.00%        1193               288    75.86%         274               103    62.41%
Functions/FunctionsStringSearch.cpp              645               333    48.37%          56                29    48.21%        1312               670    48.93%         410               215    47.56%
Functions/IFunction.h                             69                34    50.72%          60                31    48.33%          94                50    46.81%           6                 2    66.67%
Functions/LeastGreatest.h                         25                 6    76.00%          10                 4    60.00%          50                12    76.00%          10                 2    80.00%
Functions/greatest.cpp                             5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/least.cpp                                5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/tests/gtest_least_greatest.cpp         743               149    79.95%           2                 0   100.00%         262                 0   100.00%         230               117    49.13%
Interpreters/castColumn.cpp                        6                 2    66.67%           3                 1    66.67%          23                 4    82.61%           2                 1    50.00%
TestUtils/FunctionTestUtils.h                    113                 9    92.04%          33                 0   100.00%         264                 7    97.35%          48                 6    87.50%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                           2468               967    60.82%         246                99    59.76%        3714              1430    61.50%        1304               672    48.47%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16703      9456             43.39%    187166  95561        48.94%

full coverage report (for internal network access only)

ywqzzy added 5 commits January 4, 2022 15:45

update.

72f370d

Merge branch 'master' of https://github.com/pingcap/tics into least_n…

28df646

…umeric

update.

f1b878b

update test.

ced7c10

format.

1ea13ca

ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 5, 2022

clean

8691ac0

ti-chi-bot requested review from SeaRise, XuHuaiyu, dragonly and fuzhe1989 January 5, 2022 02:38

ywqzzy changed the title ~~Support Least Function for Integer types.~~ expression: support least/greatest for int/real. Jan 5, 2022

fuzhe1989 reviewed Jan 5, 2022

View reviewed changes

address part of the comments, Introduce bug in rowbased implementation.

e487e59

ywqzzy commented Jan 5, 2022

View reviewed changes

dbms/src/DataTypes/NumberTraits.h Outdated Show resolved Hide resolved

ywqzzy added 3 commits January 6, 2022 11:45

clean a little bit.

820e498

format.

a692537

remove virtual function call.

0e0ee60

ywqzzy requested a review from fuzhe1989 January 6, 2022 06:44

ywqzzy added 2 commits January 6, 2022 14:49

add test.

ca18916

update

0b3660a

fuzhe1989 reviewed Jan 7, 2022

View reviewed changes

address comments.

fec15f6

fuzhe1989 reviewed Jan 11, 2022

View reviewed changes

ywqzzy added 2 commits January 11, 2022 12:43

address comments.

f2793a6

format.

4b78450

windtalker reviewed Feb 28, 2022

View reviewed changes

guo-shaoge added 2 commits February 28, 2022 21:19

Merge branch 'master' of github.com:guo-shaoge/tics into least_numeric

8eab4e7

Merge branch 'master' of github.com:pingcap/tics into least_numeric

143f924

Signed-off-by: guo-shaoge <shaoge1994@163.com> Conflicts: dbms/CMakeLists.txt

ti-chi-bot added do-not-merge/needs-linked-issue and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 28, 2022

guo-shaoge reviewed Mar 2, 2022

View reviewed changes

guo-shaoge added 2 commits March 3, 2022 13:15

fix int type infer to be same with tidb

f134b24

Signed-off-by: guo-shaoge <shaoge1994@163.com>

fix

06ea3c2

Signed-off-by: guo-shaoge <shaoge1994@163.com>

windtalker approved these changes Mar 3, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. do-not-merge/needs-triage-completed and removed status/LGT1 Indicates that a PR has LGTM 1. do-not-merge/needs-linked-issue labels Mar 3, 2022

ti-chi-bot removed the do-not-merge/needs-triage-completed label Mar 3, 2022

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 3, 2022

Merge branch 'master' into least_numeric

6f80bf5

Merge branch 'master' into least_numeric

db6fde8

ti-chi-bot merged commit 6ed8574 into pingcap:master Mar 3, 2022

This was referenced Mar 11, 2022

Cleanup benchmark #4232

Closed

cleanup benchmark #4233

Merged

		@@ -188,15 +187,21 @@ class FunctionRowbasedLeastGreatest : public IFunction

		if (checkDataType<DataTypeInt64>(result_type.get()))

Conversation

ywqzzy commented Jan 5, 2022 • edited by guo-shaoge Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Uh oh!

ti-chi-bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywqzzy commented Jan 5, 2022

Uh oh!

ywqzzy commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

After I addressed the comments, The benchmark result is as follow:

After I addressed the comments the second time, The benchmark result is as follow:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywqzzy Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

windtalker left a comment

Choose a reason for hiding this comment

Uh oh!

guo-shaoge commented Mar 3, 2022

Uh oh!

guo-shaoge commented Mar 3, 2022

ywqzzy commented Jan 5, 2022 •

edited by guo-shaoge

Loading

ti-chi-bot commented Jan 5, 2022 •

edited

Loading

ywqzzy commented Jan 5, 2022 •

edited

Loading

ywqzzy Jan 11, 2022 •

edited

Loading