Skip to content

perf: Optimize basic numeric upcast#15458

Closed
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_int_as_bigint
Closed

perf: Optimize basic numeric upcast#15458
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_int_as_bigint

Conversation

@rui-mo
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo commented Nov 11, 2025

When the row size is large (e.g., around 300,000,000), casting from a narrower
integer type to a wider one—such as cast(integer as bigint)—can become time-
consuming.

This PR optimizes the numeric upcast by performing the cast directly on the
raw values within loops, and drops the try-catch used for potential error handing.
Since upcasts guarantee that the source value fits within the target type, overflow
handling is unnecessary in this case.

The performance gains are likely attributed to:

  1. Eliminating try-catch blocks when error handling is unnecessary.
  2. Improved auto-vectorization and lower function call overhead after replacing
    valueAt and set with direct access.
  3. Avoiding overflow checks.

Optimized conversions include:

CAST(tinyint AS smallint)
CAST(tinyint AS integer)
CAST(tinyint AS bigint)
CAST(tinyint AS real)
CAST(tinyint AS double)
CAST(tinyint AS hugeint)

CAST(smallint AS integer)
CAST(smallint AS bigint)
CAST(smallint AS real)
CAST(smallint AS double)
CAST(smallint AS hugeint)

CAST(integer AS bigint)
CAST(integer AS real)
CAST(integer AS double)
CAST(integer AS hugeint)

CAST(bigint AS real)
CAST(bigint AS double)
CAST(bigint AS hugeint)

CAST(hugeint AS real)
CAST(hugeint AS double)

CAST(real AS double)

Before:

============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli            1.05min    15.86m
numeric_upcast##cast_tinyint_as_smallint                   1.02min    16.28m
numeric_upcast##cast_tinyint_nullable_as_intege            1.17min    14.23m
numeric_upcast##cast_tinyint_as_integer                    1.08min    15.38m
numeric_upcast##cast_tinyint_nullable_as_bigint            1.70min     9.82m
numeric_upcast##cast_tinyint_as_bigint                     1.37min    12.20m
numeric_upcast##cast_tinyint_nullable_as_real              2.10min     7.93m
numeric_upcast##cast_tinyint_as_real                       2.18min     7.63m
numeric_upcast##cast_tinyint_nullable_as_double            2.49min     6.70m
numeric_upcast##cast_tinyint_as_double                     2.32min     7.18m
numeric_upcast##cast_tinyint_nullable_as_hugein            2.46min     6.78m
numeric_upcast##cast_tinyint_as_hugeint                    2.31min     7.21m
numeric_upcast##cast_smallint_nullable_as_integ            1.20min    13.94m
numeric_upcast##cast_smallint_as_integer                   1.09min    15.34m
numeric_upcast##cast_smallint_nullable_as_bigin            1.66min    10.05m
numeric_upcast##cast_smallint_as_bigint                    1.37min    12.17m
numeric_upcast##cast_smallint_nullable_as_real             2.20min     7.57m
numeric_upcast##cast_smallint_as_real                      2.31min     7.21m
numeric_upcast##cast_smallint_nullable_as_doubl            2.46min     6.78m
numeric_upcast##cast_smallint_as_double                    2.31min     7.22m
numeric_upcast##cast_smallint_nullable_as_hugei            2.59min     6.45m
numeric_upcast##cast_smallint_as_hugeint                   2.31min     7.21m
numeric_upcast##cast_integer_nullable_as_bigint            1.70min     9.83m
numeric_upcast##cast_integer_as_bigint                     1.37min    12.14m
numeric_upcast##cast_integer_nullable_as_real              1.72min     9.72m
numeric_upcast##cast_integer_as_real                       1.81min     9.19m
numeric_upcast##cast_integer_nullable_as_double            2.09min     7.98m
numeric_upcast##cast_integer_as_double                     1.94min     8.61m
numeric_upcast##cast_integer_nullable_as_hugein            2.09min     7.98m
numeric_upcast##cast_integer_as_hugeint                    1.94min     8.60m
numeric_upcast##cast_bigint_nullable_as_real               1.72min     9.67m
numeric_upcast##cast_bigint_as_real                        1.82min     9.16m
numeric_upcast##cast_bigint_nullable_as_double             2.09min     7.96m
numeric_upcast##cast_bigint_as_double                      1.94min     8.57m
numeric_upcast##cast_bigint_nullable_as_hugeint            2.08min     8.00m
numeric_upcast##cast_bigint_as_hugeint                     1.95min     8.56m
numeric_upcast##cast_hugeint_nullable_as_real              2.78min     6.00m
numeric_upcast##cast_hugeint_as_real                       2.49min     6.70m
numeric_upcast##cast_hugeint_nullable_as_double            2.57min     6.48m
numeric_upcast##cast_hugeint_as_double                     2.31min     7.21m
numeric_upcast##cast_real_nullable_as_double               2.20min     7.59m
numeric_upcast##cast_real_as_double                        1.92min     8.70m
----------------------------------------------------------------------------

After:

============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli             25.15s    39.76m
numeric_upcast##cast_tinyint_as_smallint                    14.28s    70.05m
numeric_upcast##cast_tinyint_nullable_as_intege             36.05s    27.74m
numeric_upcast##cast_tinyint_as_integer                     18.55s    53.91m
numeric_upcast##cast_tinyint_nullable_as_bigint             54.77s    18.26m
numeric_upcast##cast_tinyint_as_bigint                      26.45s    37.81m
numeric_upcast##cast_tinyint_nullable_as_real               33.12s    30.19m
numeric_upcast##cast_tinyint_as_real                        20.59s    48.56m
numeric_upcast##cast_tinyint_nullable_as_double             55.68s    17.96m
numeric_upcast##cast_tinyint_as_double                      28.26s    35.38m
numeric_upcast##cast_tinyint_nullable_as_hugein             54.93s    18.21m
numeric_upcast##cast_tinyint_as_hugeint                     28.21s    35.45m
numeric_upcast##cast_smallint_nullable_as_integ             33.92s    29.48m
numeric_upcast##cast_smallint_as_integer                    18.45s    54.21m
numeric_upcast##cast_smallint_nullable_as_bigin             56.99s    17.55m
numeric_upcast##cast_smallint_as_bigint                     26.37s    37.92m
numeric_upcast##cast_smallint_nullable_as_real              32.48s    30.79m
numeric_upcast##cast_smallint_as_real                       19.67s    50.83m
numeric_upcast##cast_smallint_nullable_as_doubl             53.93s    18.54m
numeric_upcast##cast_smallint_as_double                     28.79s    34.73m
numeric_upcast##cast_smallint_nullable_as_hugei             53.07s    18.84m
numeric_upcast##cast_smallint_as_hugeint                    28.49s    35.10m
numeric_upcast##cast_integer_nullable_as_bigint            1.01min    16.51m
numeric_upcast##cast_integer_as_bigint                      27.04s    36.98m
numeric_upcast##cast_integer_nullable_as_real               35.55s    28.13m
numeric_upcast##cast_integer_as_real                        20.14s    49.66m
numeric_upcast##cast_integer_nullable_as_double             56.62s    17.66m
numeric_upcast##cast_integer_as_double                      28.39s    35.23m
numeric_upcast##cast_integer_nullable_as_hugein             57.36s    17.43m
numeric_upcast##cast_integer_as_hugeint                     28.49s    35.10m
numeric_upcast##cast_bigint_nullable_as_real                35.33s    28.31m
numeric_upcast##cast_bigint_as_real                         20.35s    49.14m
numeric_upcast##cast_bigint_nullable_as_double              56.84s    17.59m
numeric_upcast##cast_bigint_as_double                       28.41s    35.20m
numeric_upcast##cast_bigint_nullable_as_hugeint             58.48s    17.10m
numeric_upcast##cast_bigint_as_hugeint                      28.41s    35.20m
numeric_upcast##cast_hugeint_nullable_as_real              2.08min     8.01m
numeric_upcast##cast_hugeint_as_real                       2.01min     8.28m
numeric_upcast##cast_hugeint_nullable_as_double            1.89min     8.83m
numeric_upcast##cast_hugeint_as_double                     1.37min    12.13m
numeric_upcast##cast_real_nullable_as_double                59.15s    16.91m
numeric_upcast##cast_real_as_double                         29.15s    34.30m

----------------------------------------------------------------------------

@rui-mo rui-mo requested a review from majetideepak as a code owner November 11, 2025 08:17
@netlify
Copy link
Copy Markdown

netlify Bot commented Nov 11, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d5754a2
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/699599b7fbdc7400085f2b4a

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025
@rui-mo rui-mo changed the title misc: Optimize integral upcast misc: Optimize integral upcast when all selected Nov 11, 2025
@rui-mo
Copy link
Copy Markdown
Contributor Author

rui-mo commented Nov 11, 2025

cc: @zhouyuan

Comment thread velox/expression/CastExpr.cpp Outdated
Comment thread velox/benchmarks/basic/IntegralUpcastBenchmark.cpp Outdated
Comment thread velox/expression/tests/CastExprTest.cpp Outdated
@@ -3992,5 +3992,43 @@ TEST_F(CastExprTest, timeToTimestampCast) {
assertEqualVectors(expected, result);
}
}

TEST_F(CastExprTest, integeralUpcast) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about tests, I think it's important to also check

  1. tinyint => integer
  2. tinyint => bigint
  3. smallint => bigint

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all relevant cases in the test, thanks.

Comment thread velox/expression/CastExpr.cpp Outdated
@@ -47,6 +47,50 @@ const tz::TimeZone* getTimeZoneFromConfig(const core::QueryConfig& config) {
return nullptr;
}

bool isIntegralType(const TypePtr& type) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this optimization also should work for hugeint

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hugeint is used to represent the decimal type. Casting an integer type to a decimal type requires rescaling to match the target scale, which is a different operation and needs special handling. Therefore, I excluded decimal from this optimization.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hugeint isn't only decimal. I think hugeint also just int128_t that can be.
See example int128 https://clickhouse.com/docs/sql-reference/data-types/int-uint

Comment thread velox/expression/CastExpr.cpp Outdated
Comment thread velox/expression/CastExpr.cpp Outdated
@MBkkt
Copy link
Copy Markdown
Collaborator

MBkkt commented Nov 11, 2025

@rui-mo Can you explain it to me?

You mentioned in PR description that hotspot is forEachSetBit

But at the same time you wrote that this PR optimizes only when rows are all selected.

This PR optimizes the integral upcast by performing the cast directly on the
raw values within loops when rows are all selected.

But in such case shouldn't be forEachSetBit, right?

Because Velox have such code

template <typename Callable>
inline void SelectivityVector::applyToSelected(Callable func) const {
  if (isAllSelected()) {
    const auto end = end_;
    for (vector_size_t row = begin_; row < end; ++row) {
      func(row);
    }
  } else {
    bits::forEachSetBit(bits_.data(), begin_, end_, func);
  }
}

@rui-mo rui-mo changed the title misc: Optimize integral upcast when all selected misc: Optimize basic numeric upcast Nov 12, 2025
@jinchengchenghh
Copy link
Copy Markdown
Contributor

This optimization maybe similar to following

// The compiler seems to be a little fickle with optimizations.
// Although rows.applyToSelected should do roughly the same thing, doing
// this here along with assigning rows.size() to a variable seems to help
// the compiler to inline hashOne showing a 50% performance improvement in
// benchmarks.

@jinchengchenghh
Copy link
Copy Markdown
Contributor

We may need to optimize applyToSelected for all the functions

Comment thread velox/expression/CastExpr.cpp Outdated
return false;
}

#define VELOX_DYNAMIC_BASIC_NUMERIC_TEMPLATE_TYPE_DISPATCH( \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH and check std::is_arithmetic_v

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to VELOX_DYNAMIC_SCALAR_TEMPLATE_TYPE_DISPATCH, thanks.

Comment thread velox/expression/CastExpr.cpp
Comment thread velox/expression/CastExpr.cpp Outdated
Comment thread velox/expression/CastExpr.cpp Outdated
Copy link
Copy Markdown
Collaborator

@MBkkt MBkkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why do we see speedup?
Is it because there's no overflow checks? Or because something else?

and drops the try-catch .

Is it because of this? Maybe we can make this optimization more generic?
I think we can create issue and mention in this issue that this code can be dropped when more generic approach will be implemented

@rui-mo
Copy link
Copy Markdown
Contributor Author

rui-mo commented Nov 12, 2025

@rui-mo Can you explain it to me?

You mentioned in PR description that hotspot is forEachSetBit

But at the same time you wrote that this PR optimizes only when rows are all selected.

This PR optimizes the integral upcast by performing the cast directly on the
raw values within loops when rows are all selected.

But in such case shouldn't be forEachSetBit, right?

Because Velox have such code

template <typename Callable>
inline void SelectivityVector::applyToSelected(Callable func) const {
  if (isAllSelected()) {
    const auto end = end_;
    for (vector_size_t row = begin_; row < end; ++row) {
      func(row);
    }
  } else {
    bits::forEachSetBit(bits_.data(), begin_, end_, func);
  }
}

@MBkkt Sorry for the confusion. This PR improves performance in both scenarios. When not all rows are selected (for example, when null values are present and are deselected before casting), the performance hotspot is as follows.

has_null

And when all rows are selected, the hotspot is as follows.

no_null

Although Velox already applies the for-loop optimization you mentioned above when all rows are selected, CastExpr::applyToSelectedNoThrowLocal still includes a try-catch block for potential error handling, which is unnecessary in upcast cases. Eliminating this overhead is one of the reasons for the performance improvement in the all-selected scenario.

I will include both nullable and non-nullable benchmark results in the PR description to verify that both scenarios see performance improvements.

@MBkkt
Copy link
Copy Markdown
Collaborator

MBkkt commented Nov 12, 2025

@rui-mo Thanks, I understand now.

I think it will be useful to move this optimization to lambda that supplied by CastExpr to SelectivityVector::applyToSelected, but it's separate task

@Yuhta
Copy link
Copy Markdown
Contributor

Yuhta commented Nov 12, 2025

I think applyToSelected is already optimized (see https://github.com/facebookincubator/velox/pull/10301/files#diff-3d8a3c26f9d059ef07d7c31f08b2c605c7e748206493887da6023d2a085f00b8), the code in https://github.com/facebookincubator/velox/blame/4056e41e6efadd20622e92e1b04162d59276d63c/velox/connectors/hive/HivePartitionFunction.cpp#L119-L123 is outdated and could be removed.

The only thing need is to avoid try catch when it is not necessary.

@rui-mo
Copy link
Copy Markdown
Contributor Author

rui-mo commented Nov 13, 2025

@MBkkt @Yuhta Thanks for sharing your insights. I ran a few experiments to investigate where the performance improvement comes from. For CAST(integer AS bigint), the original performance is as follows:

numeric_upcast##cast_integer_nullable_as_bigint            1.74min     9.55m
numeric_upcast##cast_integer_as_bigint                     1.38min    12.10m

After removing the try-catch from CastExpr::applyToSelectedNoThrowLocal and replacing callFollyTo with static_cast (bypassing overflow checks and error handling), I observed a slight performance gain, shown below.

numeric_upcast##cast_integer_nullable_as_bigint            1.66min    10.03m
numeric_upcast##cast_integer_as_bigint                     1.26min    13.22m

I further removed the try-catch block from CastExpr::applyCastKernel, which resulted in the following performance.

numeric_upcast##cast_integer_nullable_as_bigint            1.33min    12.51m
numeric_upcast##cast_integer_as_bigint                      58.44s    17.11m

I then used perf record to identify the performance hotspot and found the results below, which show that the valueAt and set functions are the main time consumers.

Screenshot 2025-11-13 at 14 41 10

In summary, the performance gains are likely attributed to:

  1. Eliminating try-catch blocks when error handling is unnecessary.
  2. Improved auto-vectorization and lower function call overhead after replacing valueAt and set with direct access.
  3. Avoiding overflow checks.

Comment thread velox/expression/CastExpr.cpp Outdated
@rui-mo rui-mo force-pushed the wip_int_as_bigint branch 2 times, most recently from 600fb40 to 04e7cbb Compare November 13, 2025 15:11
Copy link
Copy Markdown
Collaborator

@MBkkt MBkkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

I think more general approach to how avoid "try catch"/etc overhead can be developed separately

Comment thread velox/expression/CastExpr.cpp Outdated
Comment on lines +888 to +894
isIntegralType(fromType) && isBasicNumericType(toType) &&
((fromType->cppSizeInBytes() < toType->cppSizeInBytes()) ||
(fromType == INTEGER() && toType == REAL()) ||
(fromType == BIGINT() && toType == REAL()) ||
(fromType == BIGINT() && toType == DOUBLE()) ||
(fromType == HUGEINT() && toType == REAL()) ||
(fromType == HUGEINT() && toType == DOUBLE()))) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real to double also should be ok?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks. I added its support as well as tests and benchmarks.
I opened #15506 for the discussion of avoiding "try catch" in cast.

Comment thread velox/expression/CastExpr.cpp
@rui-mo rui-mo force-pushed the wip_int_as_bigint branch from 3b02501 to 8f7b574 Compare January 21, 2026 09:54
@rui-mo rui-mo force-pushed the wip_int_as_bigint branch 2 times, most recently from bb061ed to 0c1a034 Compare February 10, 2026 09:57
@rui-mo rui-mo changed the title misc: Optimize basic numeric upcast perf: Optimize basic numeric upcast Feb 10, 2026
@rui-mo rui-mo closed this by deleting the head repository Mar 30, 2026
meta-codesync Bot pushed a commit that referenced this pull request Apr 10, 2026
Summary:
When the row size is large (e.g., around 300,000,000), casting from a narrower
integer type to a wider one—such as cast(integer as bigint)—can become time-
consuming.

This PR optimizes the numeric upcast by performing the cast directly on the
raw values within loops, and drops the try-catch used for potential error handing.
Since upcasts guarantee that the source value fits within the target type, overflow
handling is unnecessary in this case.

The performance gains are likely attributed to:
1) Eliminating try-catch blocks when error handling is unnecessary.
2) Improved auto-vectorization and lower function call overhead after replacing
`valueAt` and `set` with direct access.
3) Avoiding overflow checks.

Optimized conversions include:
```
CAST(tinyint AS smallint)
CAST(tinyint AS integer)
CAST(tinyint AS bigint)
CAST(tinyint AS real)
CAST(tinyint AS double)

CAST(smallint AS integer)
CAST(smallint AS bigint)
CAST(smallint AS real)
CAST(smallint AS double)

CAST(integer AS bigint)
CAST(integer AS real)
CAST(integer AS double)

CAST(bigint AS real)
CAST(bigint AS double)

CAST(real AS double)
```

Before:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli             57.89s    17.27m
numeric_upcast##cast_tinyint_as_smallint                   1.09min    15.32m
numeric_upcast##cast_tinyint_nullable_as_intege             59.07s    16.93m
numeric_upcast##cast_tinyint_as_integer                    1.09min    15.25m
numeric_upcast##cast_tinyint_nullable_as_bigint            1.03min    16.12m
numeric_upcast##cast_tinyint_as_bigint                     1.13min    14.71m
numeric_upcast##cast_tinyint_nullable_as_real              1.87min     8.90m
numeric_upcast##cast_tinyint_as_real                       2.16min     7.71m
numeric_upcast##cast_tinyint_nullable_as_double            1.79min     9.29m
numeric_upcast##cast_tinyint_as_double                     2.06min     8.10m
numeric_upcast##cast_smallint_nullable_as_integ             59.30s    16.86m
numeric_upcast##cast_smallint_as_integer                   1.11min    15.01m
numeric_upcast##cast_smallint_nullable_as_bigin            1.02min    16.29m
numeric_upcast##cast_smallint_as_bigint                    1.14min    14.59m
numeric_upcast##cast_smallint_nullable_as_real             1.99min     8.37m
numeric_upcast##cast_smallint_as_real                      2.29min     7.26m
numeric_upcast##cast_smallint_nullable_as_doubl            1.80min     9.28m
numeric_upcast##cast_smallint_as_double                    2.03min     8.23m
numeric_upcast##cast_integer_nullable_as_bigint            1.03min    16.24m
numeric_upcast##cast_integer_as_bigint                     1.12min    14.89m
numeric_upcast##cast_integer_nullable_as_real              1.40min    11.88m
numeric_upcast##cast_integer_as_real                       1.64min    10.15m
numeric_upcast##cast_integer_nullable_as_double            1.44min    11.56m
numeric_upcast##cast_integer_as_double                     1.65min    10.09m
numeric_upcast##cast_bigint_nullable_as_real               1.41min    11.78m
numeric_upcast##cast_bigint_as_real                        1.65min    10.12m
numeric_upcast##cast_bigint_nullable_as_double             1.46min    11.44m
numeric_upcast##cast_bigint_as_double                      1.65min    10.09m
numeric_upcast##cast_real_nullable_as_double               1.43min    11.64m
numeric_upcast##cast_real_as_double                        1.69min     9.85m
----------------------------------------------------------------------------
```
After:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli             15.12s    66.12m
numeric_upcast##cast_tinyint_as_smallint                  931.11ms      1.07
numeric_upcast##cast_tinyint_nullable_as_intege             16.61s    60.22m
numeric_upcast##cast_tinyint_as_integer                      2.21s   451.83m
numeric_upcast##cast_tinyint_nullable_as_bigint             19.33s    51.73m
numeric_upcast##cast_tinyint_as_bigint                       4.32s   231.37m
numeric_upcast##cast_tinyint_nullable_as_real               16.50s    60.62m
numeric_upcast##cast_tinyint_as_real                         2.83s   353.33m
numeric_upcast##cast_tinyint_nullable_as_double             19.13s    52.26m
numeric_upcast##cast_tinyint_as_double                       8.07s   123.97m
numeric_upcast##cast_smallint_nullable_as_integ             18.96s    52.76m
numeric_upcast##cast_smallint_as_integer                     1.78s   561.26m
numeric_upcast##cast_smallint_nullable_as_bigin             18.72s    53.41m
numeric_upcast##cast_smallint_as_bigint                      4.38s   228.11m
numeric_upcast##cast_smallint_nullable_as_real              16.34s    61.19m
numeric_upcast##cast_smallint_as_real                        1.71s   583.54m
numeric_upcast##cast_smallint_nullable_as_doubl             18.65s    53.62m
numeric_upcast##cast_smallint_as_double                      2.85s   351.36m
numeric_upcast##cast_integer_nullable_as_bigint             18.43s    54.26m
numeric_upcast##cast_integer_as_bigint                       3.47s   288.11m
numeric_upcast##cast_integer_nullable_as_real               16.58s    60.32m
numeric_upcast##cast_integer_as_real                         2.41s   414.34m
numeric_upcast##cast_integer_nullable_as_double             18.53s    53.97m
numeric_upcast##cast_integer_as_double                       3.34s   299.25m
numeric_upcast##cast_bigint_nullable_as_real                16.82s    59.45m
numeric_upcast##cast_bigint_as_real                          4.23s   236.48m
numeric_upcast##cast_bigint_nullable_as_double              19.15s    52.22m
numeric_upcast##cast_bigint_as_double                        4.56s   219.08m
numeric_upcast##cast_real_nullable_as_double                18.35s    54.50m
numeric_upcast##cast_real_as_double                          3.43s   291.53m
----------------------------------------------------------------------------
```

Replace: #15458

Pull Request resolved: #16967

Reviewed By: peterenescu

Differential Revision: D99139839

Pulled By: bikramSingh91

fbshipit-source-id: ffb57fcd6bcab15e32b72deba2f53c2aea8ba102
shrshi pushed a commit to patdevinwilson/velox that referenced this pull request Apr 13, 2026
Summary:
When the row size is large (e.g., around 300,000,000), casting from a narrower
integer type to a wider one—such as cast(integer as bigint)—can become time-
consuming.

This PR optimizes the numeric upcast by performing the cast directly on the
raw values within loops, and drops the try-catch used for potential error handing.
Since upcasts guarantee that the source value fits within the target type, overflow
handling is unnecessary in this case.

The performance gains are likely attributed to:
1) Eliminating try-catch blocks when error handling is unnecessary.
2) Improved auto-vectorization and lower function call overhead after replacing
`valueAt` and `set` with direct access.
3) Avoiding overflow checks.

Optimized conversions include:
```
CAST(tinyint AS smallint)
CAST(tinyint AS integer)
CAST(tinyint AS bigint)
CAST(tinyint AS real)
CAST(tinyint AS double)

CAST(smallint AS integer)
CAST(smallint AS bigint)
CAST(smallint AS real)
CAST(smallint AS double)

CAST(integer AS bigint)
CAST(integer AS real)
CAST(integer AS double)

CAST(bigint AS real)
CAST(bigint AS double)

CAST(real AS double)
```

Before:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli             57.89s    17.27m
numeric_upcast##cast_tinyint_as_smallint                   1.09min    15.32m
numeric_upcast##cast_tinyint_nullable_as_intege             59.07s    16.93m
numeric_upcast##cast_tinyint_as_integer                    1.09min    15.25m
numeric_upcast##cast_tinyint_nullable_as_bigint            1.03min    16.12m
numeric_upcast##cast_tinyint_as_bigint                     1.13min    14.71m
numeric_upcast##cast_tinyint_nullable_as_real              1.87min     8.90m
numeric_upcast##cast_tinyint_as_real                       2.16min     7.71m
numeric_upcast##cast_tinyint_nullable_as_double            1.79min     9.29m
numeric_upcast##cast_tinyint_as_double                     2.06min     8.10m
numeric_upcast##cast_smallint_nullable_as_integ             59.30s    16.86m
numeric_upcast##cast_smallint_as_integer                   1.11min    15.01m
numeric_upcast##cast_smallint_nullable_as_bigin            1.02min    16.29m
numeric_upcast##cast_smallint_as_bigint                    1.14min    14.59m
numeric_upcast##cast_smallint_nullable_as_real             1.99min     8.37m
numeric_upcast##cast_smallint_as_real                      2.29min     7.26m
numeric_upcast##cast_smallint_nullable_as_doubl            1.80min     9.28m
numeric_upcast##cast_smallint_as_double                    2.03min     8.23m
numeric_upcast##cast_integer_nullable_as_bigint            1.03min    16.24m
numeric_upcast##cast_integer_as_bigint                     1.12min    14.89m
numeric_upcast##cast_integer_nullable_as_real              1.40min    11.88m
numeric_upcast##cast_integer_as_real                       1.64min    10.15m
numeric_upcast##cast_integer_nullable_as_double            1.44min    11.56m
numeric_upcast##cast_integer_as_double                     1.65min    10.09m
numeric_upcast##cast_bigint_nullable_as_real               1.41min    11.78m
numeric_upcast##cast_bigint_as_real                        1.65min    10.12m
numeric_upcast##cast_bigint_nullable_as_double             1.46min    11.44m
numeric_upcast##cast_bigint_as_double                      1.65min    10.09m
numeric_upcast##cast_real_nullable_as_double               1.43min    11.64m
numeric_upcast##cast_real_as_double                        1.69min     9.85m
----------------------------------------------------------------------------
```
After:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli             15.12s    66.12m
numeric_upcast##cast_tinyint_as_smallint                  931.11ms      1.07
numeric_upcast##cast_tinyint_nullable_as_intege             16.61s    60.22m
numeric_upcast##cast_tinyint_as_integer                      2.21s   451.83m
numeric_upcast##cast_tinyint_nullable_as_bigint             19.33s    51.73m
numeric_upcast##cast_tinyint_as_bigint                       4.32s   231.37m
numeric_upcast##cast_tinyint_nullable_as_real               16.50s    60.62m
numeric_upcast##cast_tinyint_as_real                         2.83s   353.33m
numeric_upcast##cast_tinyint_nullable_as_double             19.13s    52.26m
numeric_upcast##cast_tinyint_as_double                       8.07s   123.97m
numeric_upcast##cast_smallint_nullable_as_integ             18.96s    52.76m
numeric_upcast##cast_smallint_as_integer                     1.78s   561.26m
numeric_upcast##cast_smallint_nullable_as_bigin             18.72s    53.41m
numeric_upcast##cast_smallint_as_bigint                      4.38s   228.11m
numeric_upcast##cast_smallint_nullable_as_real              16.34s    61.19m
numeric_upcast##cast_smallint_as_real                        1.71s   583.54m
numeric_upcast##cast_smallint_nullable_as_doubl             18.65s    53.62m
numeric_upcast##cast_smallint_as_double                      2.85s   351.36m
numeric_upcast##cast_integer_nullable_as_bigint             18.43s    54.26m
numeric_upcast##cast_integer_as_bigint                       3.47s   288.11m
numeric_upcast##cast_integer_nullable_as_real               16.58s    60.32m
numeric_upcast##cast_integer_as_real                         2.41s   414.34m
numeric_upcast##cast_integer_nullable_as_double             18.53s    53.97m
numeric_upcast##cast_integer_as_double                       3.34s   299.25m
numeric_upcast##cast_bigint_nullable_as_real                16.82s    59.45m
numeric_upcast##cast_bigint_as_real                          4.23s   236.48m
numeric_upcast##cast_bigint_nullable_as_double              19.15s    52.22m
numeric_upcast##cast_bigint_as_double                        4.56s   219.08m
numeric_upcast##cast_real_nullable_as_double                18.35s    54.50m
numeric_upcast##cast_real_as_double                          3.43s   291.53m
----------------------------------------------------------------------------
```

Replace: facebookincubator#15458

Pull Request resolved: facebookincubator#16967

Reviewed By: peterenescu

Differential Revision: D99139839

Pulled By: bikramSingh91

fbshipit-source-id: ffb57fcd6bcab15e32b72deba2f53c2aea8ba102
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants