Skip to content

expression: support least/greatest for int/real.#3801

Merged
ti-chi-bot merged 31 commits intopingcap:masterfrom
ywqzzy:least_numeric
Mar 3, 2022
Merged

expression: support least/greatest for int/real.#3801
ti-chi-bot merged 31 commits intopingcap:masterfrom
ywqzzy:least_numeric

Conversation

@ywqzzy
Copy link
Contributor

@ywqzzy ywqzzy commented Jan 5, 2022

What problem does this PR solve?

Issue Number: close #3358

Problem Summary:

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Release note

Support push down LEAST/GREATEST for integers and float to TiFlash.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 5, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • fuzhe1989
  • windtalker

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 5, 2022
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jan 5, 2022

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jan 5, 2022

How to make a benchmark?

make bench_dbms -j16
cd build/dbms
./bench_dbms

The benchmark result:

Running ./bench_dbms
Run on (32 X 3393.62 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x1)
Load Average: 2.08, 2.05, 3.70
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
LeastBench/benchVec/iterations:100                  46317731 ns     46317127 ns          100
LeastBench/benchVecWithNullable/iterations:100      46645602 ns     46639713 ns          100
LeastBench/benchNormal/iterations:100              156318886 ns    156276623 ns          100
LeastBench/benchNormalWithNullable/iterations:100  121462177 ns    121433210 ns          100
LeastBench/benchVecMoreCols/iterations:100          29257609 ns     29257149 ns          100
LeastBench/benchNormalMoreCols/iterations:100      160558118 ns    160554176 ns          100

After I addressed the comments, The benchmark result is as follow:

Running ./bench_dbms
Run on (32 X 3393.63 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x1)
Load Average: 1.86, 1.79, 1.61
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
LeastBench/benchVec/iterations:100                  40676307 ns     40674935 ns          100
LeastBench/benchVecWithNullable/iterations:100      42957459 ns     42956277 ns          100
LeastBench/benchNormal/iterations:100              146060945 ns    146052833 ns          100
LeastBench/benchNormalWithNullable/iterations:100  129120457 ns    129118243 ns          100
LeastBench/benchVecMoreCols/iterations:100          28195435 ns     28195082 ns          100
LeastBench/benchNormalMoreCols/iterations:100      159592506 ns    159591600 ns          100

After I addressed the comments the second time, The benchmark result is as follow:

The row num is 10000.
650bb429ebcd7e6d4f9fc39c59e9beb
The row num is 100000.
557a16afbb5b2c19ceaf8f6050be5d0
The row num is 1000000.
378a6d1a1ec12ac33f236cecb3e75ec
The row num is 10000000.
1641788838(1)

@ywqzzy ywqzzy changed the title Support Least Function for Integer types. expression: support least/greatest for int/real. Jan 5, 2022
Comment on lines +28 to +32
enum class LeastGreatest
{
Least,
Greatest
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum class LeastGreatest
{
    Least,
    Greatest
};

how about convert it to a trait class like:

struct LeastImpl
{
    static constexpr auto name = "tidbLeast";
    using ReturnType = xxx;
};

struct GreatestImpl
{
    static constexpr auto name = "tidbGreatest";
    using ReturnType = xxx;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to use the ReturnType. But I will change it according to the comments in this closed pr #3537

@ywqzzy ywqzzy requested a review from fuzhe1989 January 6, 2022 06:44
using Type = std::conditional_t<
std::is_floating_point_v<A> || std::is_floating_point_v<B>,
Float64,
std::conditional_t<sizeof(A) == 8 && sizeof(A) == sizeof(B), typename BinaryLeastSpecialCase<A, B>::Type, Int64>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess we don't need to check the size, but limit the types to integer/float is necessary.

Copy link
Contributor Author

@ywqzzy ywqzzy Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need it. If remove the size check,the test below won't pass, the output column will be UINT64, because tidb
don't cast every int to int64.

ASSERT_COLUMN_EQ(
        createColumn<Int64>({7, 2, 3, 3, 2}),
        executeFunction(
            func_name,
            createColumn<UInt16>({10, 2, 3, 4, 5}),
            createColumn<UInt16>({7, 6, 5, 3, 4}),
            createColumn<UInt16>({8, 9, 6, 3, 2})));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in TiDB it does return UInt64.

@@ -188,15 +187,21 @@ class FunctionRowbasedLeastGreatest : public IFunction

if (checkDataType<DataTypeInt64>(result_type.get()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(may not be critical) another way to get the type is:

TypeIndex type_index = removeNullable(block.getByPosition(arguments[1]).type)->getTypeId();
switch (type_index)
...

However I'm not sure which way is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can extract the type_index way to a function.
In function callOnBasicTypes(), I have found the similar logic

using Type = std::conditional_t<
std::is_floating_point_v<A> || std::is_floating_point_v<B>,
Float64,
std::conditional_t<sizeof(A) == 8 && sizeof(A) == sizeof(B), typename BinaryLeastSpecialCase<A, B>::Type, Int64>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in TiDB it does return UInt64.

static Result apply(A a, B b)
{
return static_cast<Result>(a) > static_cast<Result>(b) ? static_cast<Result>(a) : static_cast<Result>(b);
return accurate::greaterOp(a, b) ? static_cast<Result>(a) : static_cast<Result>(b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cast a/b to the ResultType before comparision?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 28, 2022
{tipb::ScalarFuncSig::GreatestReal, "greatest"},
{tipb::ScalarFuncSig::GreatestInt, "tidbGreatest"},
{tipb::ScalarFuncSig::GreatestReal, "tidbGreatest"},
{tipb::ScalarFuncSig::GreatestString, "greatest"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because greatest/least(bigint unsigned , bigint) will use Decimal. So we need to enable GreatestDecimal too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean greatest/least(bigint unsigned , bigint) will be rewritten to greatest/least(cast(bigint unsigned as decimal(20,0)) , cast(bigint as decimal(20,0)) in TiDB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

{
factory.registerFunction<FunctionGreatest>();
factory.registerFunction<FunctionTiDBGreatest>();
factory.registerFunction<FunctionBinaryGreatest>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe no need to register FunctionBinaryGreatest ?

struct LeastBaseImpl<A, B, true>
struct BinaryLeastBaseImpl<A, B, true>
{
using ResultType = If<std::is_floating_point_v<A> || std::is_floating_point_v<B>, double, Decimal32>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Decimal32 is not enough in some situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since greatest/least(decimal) is not supported yet, we can ignore this.

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. do-not-merge/needs-triage-completed and removed status/LGT1 Indicates that a PR has LGTM 1. do-not-merge/needs-linked-issue labels Mar 3, 2022
@guo-shaoge
Copy link
Contributor

/run-check-issue-triage-complete

@guo-shaoge
Copy link
Contributor

/run-integration-test
/run-unit-test

@sre-bot
Copy link
Collaborator

sre-bot commented Mar 3, 2022

Coverage for changed files

Filename                                     Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataTypes/NumberTraits.h                           3                 3     0.00%           3                 3     0.00%           9                 9     0.00%           0                 0         -
Flash/Coprocessor/DAGUtils.cpp                   285               228    20.00%          35                23    34.29%         491               390    20.57%         320               226    29.38%
Functions/FunctionBinaryArithmetic.h             569               203    64.32%          40                 8    80.00%        1193               288    75.86%         274               103    62.41%
Functions/FunctionsStringSearch.cpp              645               333    48.37%          56                29    48.21%        1312               670    48.93%         410               215    47.56%
Functions/IFunction.h                             69                34    50.72%          60                31    48.33%          94                50    46.81%           6                 2    66.67%
Functions/LeastGreatest.h                         25                 6    76.00%          10                 4    60.00%          50                12    76.00%          10                 2    80.00%
Functions/greatest.cpp                             5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/least.cpp                                5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/tests/gtest_least_greatest.cpp         743               149    79.95%           2                 0   100.00%         262                 0   100.00%         230               117    49.13%
Interpreters/castColumn.cpp                        6                 2    66.67%           3                 1    66.67%          23                 4    82.61%           2                 1    50.00%
TestUtils/FunctionTestUtils.h                    113                 9    92.04%          33                 0   100.00%         264                 7    97.35%          48                 6    87.50%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                           2468               967    60.82%         246                99    59.76%        3714              1430    61.50%        1304               672    48.47%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16727      9483             43.31%    186998  95657        48.85%

full coverage report (for internal network access only)

@windtalker
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

@windtalker: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 06ea3c2

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 3, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Mar 3, 2022

Coverage for changed files

Filename                                     Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataTypes/NumberTraits.h                           3                 3     0.00%           3                 3     0.00%           9                 9     0.00%           0                 0         -
Flash/Coprocessor/DAGUtils.cpp                   285               228    20.00%          35                23    34.29%         491               390    20.57%         320               226    29.38%
Functions/FunctionBinaryArithmetic.h             569               203    64.32%          40                 8    80.00%        1193               288    75.86%         274               103    62.41%
Functions/FunctionsStringSearch.cpp              645               333    48.37%          56                29    48.21%        1312               670    48.93%         410               215    47.56%
Functions/IFunction.h                             69                34    50.72%          60                31    48.33%          94                50    46.81%           6                 2    66.67%
Functions/LeastGreatest.h                         25                 6    76.00%          10                 4    60.00%          50                12    76.00%          10                 2    80.00%
Functions/greatest.cpp                             5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/least.cpp                                5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/tests/gtest_least_greatest.cpp         743               149    79.95%           2                 0   100.00%         262                 0   100.00%         230               117    49.13%
Interpreters/castColumn.cpp                        6                 2    66.67%           3                 1    66.67%          23                 4    82.61%           2                 1    50.00%
TestUtils/FunctionTestUtils.h                    113                 9    92.04%          33                 0   100.00%         264                 7    97.35%          48                 6    87.50%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                           2468               967    60.82%         246                99    59.76%        3714              1430    61.50%        1304               672    48.47%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16701      9455             43.39%    187154  95545        48.95%

full coverage report (for internal network access only)

@sre-bot
Copy link
Collaborator

sre-bot commented Mar 3, 2022

Coverage for changed files

Filename                                     Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataTypes/NumberTraits.h                           3                 3     0.00%           3                 3     0.00%           9                 9     0.00%           0                 0         -
Flash/Coprocessor/DAGUtils.cpp                   285               228    20.00%          35                23    34.29%         491               390    20.57%         320               226    29.38%
Functions/FunctionBinaryArithmetic.h             569               203    64.32%          40                 8    80.00%        1193               288    75.86%         274               103    62.41%
Functions/FunctionsStringSearch.cpp              645               333    48.37%          56                29    48.21%        1312               670    48.93%         410               215    47.56%
Functions/IFunction.h                             69                34    50.72%          60                31    48.33%          94                50    46.81%           6                 2    66.67%
Functions/LeastGreatest.h                         25                 6    76.00%          10                 4    60.00%          50                12    76.00%          10                 2    80.00%
Functions/greatest.cpp                             5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/least.cpp                                5                 0   100.00%           2                 0   100.00%           8                 0   100.00%           2                 0   100.00%
Functions/tests/gtest_least_greatest.cpp         743               149    79.95%           2                 0   100.00%         262                 0   100.00%         230               117    49.13%
Interpreters/castColumn.cpp                        6                 2    66.67%           3                 1    66.67%          23                 4    82.61%           2                 1    50.00%
TestUtils/FunctionTestUtils.h                    113                 9    92.04%          33                 0   100.00%         264                 7    97.35%          48                 6    87.50%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                           2468               967    60.82%         246                99    59.76%        3714              1430    61.50%        1304               672    48.47%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16703      9456             43.39%    187166  95561        48.94%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot merged commit 6ed8574 into pingcap:master Mar 3, 2022
This was referenced Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support least()/greatest() in TiFlash

7 participants