perf: Optimize basic numeric upcast by rui-mo · Pull Request #15458 · facebookincubator/velox

rui-mo · 2025-11-11T08:17:26Z

When the row size is large (e.g., around 300,000,000), casting from a narrower
integer type to a wider one—such as cast(integer as bigint)—can become time-
consuming.

This PR optimizes the numeric upcast by performing the cast directly on the
raw values within loops, and drops the try-catch used for potential error handing.
Since upcasts guarantee that the source value fits within the target type, overflow
handling is unnecessary in this case.

The performance gains are likely attributed to:

Eliminating try-catch blocks when error handling is unnecessary.
Improved auto-vectorization and lower function call overhead after replacing
valueAt and set with direct access.
Avoiding overflow checks.

Optimized conversions include:

CAST(tinyint AS smallint)
CAST(tinyint AS integer)
CAST(tinyint AS bigint)
CAST(tinyint AS real)
CAST(tinyint AS double)
CAST(tinyint AS hugeint)

CAST(smallint AS integer)
CAST(smallint AS bigint)
CAST(smallint AS real)
CAST(smallint AS double)
CAST(smallint AS hugeint)

CAST(integer AS bigint)
CAST(integer AS real)
CAST(integer AS double)
CAST(integer AS hugeint)

CAST(bigint AS real)
CAST(bigint AS double)
CAST(bigint AS hugeint)

CAST(hugeint AS real)
CAST(hugeint AS double)

CAST(real AS double)

Before:

============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli            1.05min    15.86m
numeric_upcast##cast_tinyint_as_smallint                   1.02min    16.28m
numeric_upcast##cast_tinyint_nullable_as_intege            1.17min    14.23m
numeric_upcast##cast_tinyint_as_integer                    1.08min    15.38m
numeric_upcast##cast_tinyint_nullable_as_bigint            1.70min     9.82m
numeric_upcast##cast_tinyint_as_bigint                     1.37min    12.20m
numeric_upcast##cast_tinyint_nullable_as_real              2.10min     7.93m
numeric_upcast##cast_tinyint_as_real                       2.18min     7.63m
numeric_upcast##cast_tinyint_nullable_as_double            2.49min     6.70m
numeric_upcast##cast_tinyint_as_double                     2.32min     7.18m
numeric_upcast##cast_tinyint_nullable_as_hugein            2.46min     6.78m
numeric_upcast##cast_tinyint_as_hugeint                    2.31min     7.21m
numeric_upcast##cast_smallint_nullable_as_integ            1.20min    13.94m
numeric_upcast##cast_smallint_as_integer                   1.09min    15.34m
numeric_upcast##cast_smallint_nullable_as_bigin            1.66min    10.05m
numeric_upcast##cast_smallint_as_bigint                    1.37min    12.17m
numeric_upcast##cast_smallint_nullable_as_real             2.20min     7.57m
numeric_upcast##cast_smallint_as_real                      2.31min     7.21m
numeric_upcast##cast_smallint_nullable_as_doubl            2.46min     6.78m
numeric_upcast##cast_smallint_as_double                    2.31min     7.22m
numeric_upcast##cast_smallint_nullable_as_hugei            2.59min     6.45m
numeric_upcast##cast_smallint_as_hugeint                   2.31min     7.21m
numeric_upcast##cast_integer_nullable_as_bigint            1.70min     9.83m
numeric_upcast##cast_integer_as_bigint                     1.37min    12.14m
numeric_upcast##cast_integer_nullable_as_real              1.72min     9.72m
numeric_upcast##cast_integer_as_real                       1.81min     9.19m
numeric_upcast##cast_integer_nullable_as_double            2.09min     7.98m
numeric_upcast##cast_integer_as_double                     1.94min     8.61m
numeric_upcast##cast_integer_nullable_as_hugein            2.09min     7.98m
numeric_upcast##cast_integer_as_hugeint                    1.94min     8.60m
numeric_upcast##cast_bigint_nullable_as_real               1.72min     9.67m
numeric_upcast##cast_bigint_as_real                        1.82min     9.16m
numeric_upcast##cast_bigint_nullable_as_double             2.09min     7.96m
numeric_upcast##cast_bigint_as_double                      1.94min     8.57m
numeric_upcast##cast_bigint_nullable_as_hugeint            2.08min     8.00m
numeric_upcast##cast_bigint_as_hugeint                     1.95min     8.56m
numeric_upcast##cast_hugeint_nullable_as_real              2.78min     6.00m
numeric_upcast##cast_hugeint_as_real                       2.49min     6.70m
numeric_upcast##cast_hugeint_nullable_as_double            2.57min     6.48m
numeric_upcast##cast_hugeint_as_double                     2.31min     7.21m
numeric_upcast##cast_real_nullable_as_double               2.20min     7.59m
numeric_upcast##cast_real_as_double                        1.92min     8.70m
----------------------------------------------------------------------------

After:

============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
numeric_upcast##cast_tinyint_nullable_as_smalli             25.15s    39.76m
numeric_upcast##cast_tinyint_as_smallint                    14.28s    70.05m
numeric_upcast##cast_tinyint_nullable_as_intege             36.05s    27.74m
numeric_upcast##cast_tinyint_as_integer                     18.55s    53.91m
numeric_upcast##cast_tinyint_nullable_as_bigint             54.77s    18.26m
numeric_upcast##cast_tinyint_as_bigint                      26.45s    37.81m
numeric_upcast##cast_tinyint_nullable_as_real               33.12s    30.19m
numeric_upcast##cast_tinyint_as_real                        20.59s    48.56m
numeric_upcast##cast_tinyint_nullable_as_double             55.68s    17.96m
numeric_upcast##cast_tinyint_as_double                      28.26s    35.38m
numeric_upcast##cast_tinyint_nullable_as_hugein             54.93s    18.21m
numeric_upcast##cast_tinyint_as_hugeint                     28.21s    35.45m
numeric_upcast##cast_smallint_nullable_as_integ             33.92s    29.48m
numeric_upcast##cast_smallint_as_integer                    18.45s    54.21m
numeric_upcast##cast_smallint_nullable_as_bigin             56.99s    17.55m
numeric_upcast##cast_smallint_as_bigint                     26.37s    37.92m
numeric_upcast##cast_smallint_nullable_as_real              32.48s    30.79m
numeric_upcast##cast_smallint_as_real                       19.67s    50.83m
numeric_upcast##cast_smallint_nullable_as_doubl             53.93s    18.54m
numeric_upcast##cast_smallint_as_double                     28.79s    34.73m
numeric_upcast##cast_smallint_nullable_as_hugei             53.07s    18.84m
numeric_upcast##cast_smallint_as_hugeint                    28.49s    35.10m
numeric_upcast##cast_integer_nullable_as_bigint            1.01min    16.51m
numeric_upcast##cast_integer_as_bigint                      27.04s    36.98m
numeric_upcast##cast_integer_nullable_as_real               35.55s    28.13m
numeric_upcast##cast_integer_as_real                        20.14s    49.66m
numeric_upcast##cast_integer_nullable_as_double             56.62s    17.66m
numeric_upcast##cast_integer_as_double                      28.39s    35.23m
numeric_upcast##cast_integer_nullable_as_hugein             57.36s    17.43m
numeric_upcast##cast_integer_as_hugeint                     28.49s    35.10m
numeric_upcast##cast_bigint_nullable_as_real                35.33s    28.31m
numeric_upcast##cast_bigint_as_real                         20.35s    49.14m
numeric_upcast##cast_bigint_nullable_as_double              56.84s    17.59m
numeric_upcast##cast_bigint_as_double                       28.41s    35.20m
numeric_upcast##cast_bigint_nullable_as_hugeint             58.48s    17.10m
numeric_upcast##cast_bigint_as_hugeint                      28.41s    35.20m
numeric_upcast##cast_hugeint_nullable_as_real              2.08min     8.01m
numeric_upcast##cast_hugeint_as_real                       2.01min     8.28m
numeric_upcast##cast_hugeint_nullable_as_double            1.89min     8.83m
numeric_upcast##cast_hugeint_as_double                     1.37min    12.13m
numeric_upcast##cast_real_nullable_as_double                59.15s    16.91m
numeric_upcast##cast_real_as_double                         29.15s    34.30m

----------------------------------------------------------------------------

netlify · 2025-11-11T08:17:31Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`d5754a2`
🔍 Latest deploy log	https://app.netlify.com/projects/meta-velox/deploys/699599b7fbdc7400085f2b4a

rui-mo · 2025-11-11T09:10:06Z

cc: @zhouyuan

MBkkt · 2025-11-11T10:41:48Z

@@ -3992,5 +3992,43 @@ TEST_F(CastExprTest, timeToTimestampCast) {
    assertEqualVectors(expected, result);
  }
 }
+
+TEST_F(CastExprTest, integeralUpcast) {


Same about tests, I think it's important to also check

tinyint => integer

tinyint => bigint

smallint => bigint

Added all relevant cases in the test, thanks.

MBkkt · 2025-11-11T10:42:36Z

@@ -47,6 +47,50 @@ const tz::TimeZone* getTimeZoneFromConfig(const core::QueryConfig& config) {
  return nullptr;
 }

+bool isIntegralType(const TypePtr& type) {


I think this optimization also should work for hugeint

Hugeint is used to represent the decimal type. Casting an integer type to a decimal type requires rescaling to match the target scale, which is a different operation and needs special handling. Therefore, I excluded decimal from this optimization.

Hugeint isn't only decimal. I think hugeint also just int128_t that can be.
See example int128 https://clickhouse.com/docs/sql-reference/data-types/int-uint

MBkkt · 2025-11-11T18:45:02Z

@rui-mo Can you explain it to me?

You mentioned in PR description that hotspot is forEachSetBit

But at the same time you wrote that this PR optimizes only when rows are all selected.

This PR optimizes the integral upcast by performing the cast directly on the
raw values within loops when rows are all selected.

But in such case shouldn't be forEachSetBit, right?

Because Velox have such code

template <typename Callable>
inline void SelectivityVector::applyToSelected(Callable func) const {
  if (isAllSelected()) {
    const auto end = end_;
    for (vector_size_t row = begin_; row < end; ++row) {
      func(row);
    }
  } else {
    bits::forEachSetBit(bits_.data(), begin_, end_, func);
  }
}

jinchengchenghh · 2025-11-12T10:07:04Z

This optimization maybe similar to following

velox/velox/connectors/hive/HivePartitionFunction.cpp

Lines 119 to 123 in 4056e41

    
           // The compiler seems to be a little fickle with optimizations. 
        
           // Although rows.applyToSelected should do roughly the same thing, doing 
        
           // this here along with assigning rows.size() to a variable seems to help 
        
           // the compiler to inline hashOne showing a 50% performance improvement in 
        
           // benchmarks.

jinchengchenghh · 2025-11-12T10:12:42Z

We may need to optimize applyToSelected for all the functions

jinchengchenghh · 2025-11-12T10:16:33Z

+  return false;
+}
+
+#define VELOX_DYNAMIC_BASIC_NUMERIC_TEMPLATE_TYPE_DISPATCH(             \


Could we use VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH and check std::is_arithmetic_v

I changed to VELOX_DYNAMIC_SCALAR_TEMPLATE_TYPE_DISPATCH, thanks.

MBkkt

Can you please explain why do we see speedup?
Is it because there's no overflow checks? Or because something else?

and drops the try-catch .

Is it because of this? Maybe we can make this optimization more generic?
I think we can create issue and mention in this issue that this code can be dropped when more generic approach will be implemented

rui-mo · 2025-11-12T15:52:54Z

@rui-mo Can you explain it to me?

You mentioned in PR description that hotspot is forEachSetBit

But at the same time you wrote that this PR optimizes only when rows are all selected.

This PR optimizes the integral upcast by performing the cast directly on the
raw values within loops when rows are all selected.

But in such case shouldn't be forEachSetBit, right?

Because Velox have such code
template <typename Callable>
inline void SelectivityVector::applyToSelected(Callable func) const {
  if (isAllSelected()) {
    const auto end = end_;
    for (vector_size_t row = begin_; row < end; ++row) {
      func(row);
    }
  } else {
    bits::forEachSetBit(bits_.data(), begin_, end_, func);
  }
}

@MBkkt Sorry for the confusion. This PR improves performance in both scenarios. When not all rows are selected (for example, when null values are present and are deselected before casting), the performance hotspot is as follows.

And when all rows are selected, the hotspot is as follows.

Although Velox already applies the for-loop optimization you mentioned above when all rows are selected, CastExpr::applyToSelectedNoThrowLocal still includes a try-catch block for potential error handling, which is unnecessary in upcast cases. Eliminating this overhead is one of the reasons for the performance improvement in the all-selected scenario.

I will include both nullable and non-nullable benchmark results in the PR description to verify that both scenarios see performance improvements.

MBkkt · 2025-11-12T16:27:53Z

@rui-mo Thanks, I understand now.

I think it will be useful to move this optimization to lambda that supplied by CastExpr to SelectivityVector::applyToSelected, but it's separate task

Yuhta · 2025-11-12T19:57:06Z

I think applyToSelected is already optimized (see https://github.com/facebookincubator/velox/pull/10301/files#diff-3d8a3c26f9d059ef07d7c31f08b2c605c7e748206493887da6023d2a085f00b8), the code in https://github.com/facebookincubator/velox/blame/4056e41e6efadd20622e92e1b04162d59276d63c/velox/connectors/hive/HivePartitionFunction.cpp#L119-L123 is outdated and could be removed.

The only thing need is to avoid try catch when it is not necessary.

rui-mo · 2025-11-13T07:05:51Z

@MBkkt @Yuhta Thanks for sharing your insights. I ran a few experiments to investigate where the performance improvement comes from. For CAST(integer AS bigint), the original performance is as follows:

numeric_upcast##cast_integer_nullable_as_bigint            1.74min     9.55m
numeric_upcast##cast_integer_as_bigint                     1.38min    12.10m

After removing the try-catch from CastExpr::applyToSelectedNoThrowLocal and replacing callFollyTo with static_cast (bypassing overflow checks and error handling), I observed a slight performance gain, shown below.

numeric_upcast##cast_integer_nullable_as_bigint            1.66min    10.03m
numeric_upcast##cast_integer_as_bigint                     1.26min    13.22m

I further removed the try-catch block from CastExpr::applyCastKernel, which resulted in the following performance.

numeric_upcast##cast_integer_nullable_as_bigint            1.33min    12.51m
numeric_upcast##cast_integer_as_bigint                      58.44s    17.11m

I then used perf record to identify the performance hotspot and found the results below, which show that the valueAt and set functions are the main time consumers.

In summary, the performance gains are likely attributed to:

Eliminating try-catch blocks when error handling is unnecessary.
Improved auto-vectorization and lower function call overhead after replacing valueAt and set with direct access.
Avoiding overflow checks.

MBkkt

Overall looks good to me.

I think more general approach to how avoid "try catch"/etc overhead can be developed separately

MBkkt · 2025-11-13T15:15:11Z

+      isIntegralType(fromType) && isBasicNumericType(toType) &&
+      ((fromType->cppSizeInBytes() < toType->cppSizeInBytes()) ||
+       (fromType == INTEGER() && toType == REAL()) ||
+       (fromType == BIGINT() && toType == REAL()) ||
+       (fromType == BIGINT() && toType == DOUBLE()) ||
+       (fromType == HUGEINT() && toType == REAL()) ||
+       (fromType == HUGEINT() && toType == DOUBLE()))) {


real to double also should be ok?

Nice catch, thanks. I added its support as well as tests and benchmarks.
I opened #15506 for the discussion of avoiding "try catch" in cast.

Summary: When the row size is large (e.g., around 300,000,000), casting from a narrower integer type to a wider one—such as cast(integer as bigint)—can become time- consuming. This PR optimizes the numeric upcast by performing the cast directly on the raw values within loops, and drops the try-catch used for potential error handing. Since upcasts guarantee that the source value fits within the target type, overflow handling is unnecessary in this case. The performance gains are likely attributed to: 1) Eliminating try-catch blocks when error handling is unnecessary. 2) Improved auto-vectorization and lower function call overhead after replacing `valueAt` and `set` with direct access. 3) Avoiding overflow checks. Optimized conversions include: ``` CAST(tinyint AS smallint) CAST(tinyint AS integer) CAST(tinyint AS bigint) CAST(tinyint AS real) CAST(tinyint AS double) CAST(smallint AS integer) CAST(smallint AS bigint) CAST(smallint AS real) CAST(smallint AS double) CAST(integer AS bigint) CAST(integer AS real) CAST(integer AS double) CAST(bigint AS real) CAST(bigint AS double) CAST(real AS double) ``` Before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ numeric_upcast##cast_tinyint_nullable_as_smalli 57.89s 17.27m numeric_upcast##cast_tinyint_as_smallint 1.09min 15.32m numeric_upcast##cast_tinyint_nullable_as_intege 59.07s 16.93m numeric_upcast##cast_tinyint_as_integer 1.09min 15.25m numeric_upcast##cast_tinyint_nullable_as_bigint 1.03min 16.12m numeric_upcast##cast_tinyint_as_bigint 1.13min 14.71m numeric_upcast##cast_tinyint_nullable_as_real 1.87min 8.90m numeric_upcast##cast_tinyint_as_real 2.16min 7.71m numeric_upcast##cast_tinyint_nullable_as_double 1.79min 9.29m numeric_upcast##cast_tinyint_as_double 2.06min 8.10m numeric_upcast##cast_smallint_nullable_as_integ 59.30s 16.86m numeric_upcast##cast_smallint_as_integer 1.11min 15.01m numeric_upcast##cast_smallint_nullable_as_bigin 1.02min 16.29m numeric_upcast##cast_smallint_as_bigint 1.14min 14.59m numeric_upcast##cast_smallint_nullable_as_real 1.99min 8.37m numeric_upcast##cast_smallint_as_real 2.29min 7.26m numeric_upcast##cast_smallint_nullable_as_doubl 1.80min 9.28m numeric_upcast##cast_smallint_as_double 2.03min 8.23m numeric_upcast##cast_integer_nullable_as_bigint 1.03min 16.24m numeric_upcast##cast_integer_as_bigint 1.12min 14.89m numeric_upcast##cast_integer_nullable_as_real 1.40min 11.88m numeric_upcast##cast_integer_as_real 1.64min 10.15m numeric_upcast##cast_integer_nullable_as_double 1.44min 11.56m numeric_upcast##cast_integer_as_double 1.65min 10.09m numeric_upcast##cast_bigint_nullable_as_real 1.41min 11.78m numeric_upcast##cast_bigint_as_real 1.65min 10.12m numeric_upcast##cast_bigint_nullable_as_double 1.46min 11.44m numeric_upcast##cast_bigint_as_double 1.65min 10.09m numeric_upcast##cast_real_nullable_as_double 1.43min 11.64m numeric_upcast##cast_real_as_double 1.69min 9.85m ---------------------------------------------------------------------------- ``` After: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ numeric_upcast##cast_tinyint_nullable_as_smalli 15.12s 66.12m numeric_upcast##cast_tinyint_as_smallint 931.11ms 1.07 numeric_upcast##cast_tinyint_nullable_as_intege 16.61s 60.22m numeric_upcast##cast_tinyint_as_integer 2.21s 451.83m numeric_upcast##cast_tinyint_nullable_as_bigint 19.33s 51.73m numeric_upcast##cast_tinyint_as_bigint 4.32s 231.37m numeric_upcast##cast_tinyint_nullable_as_real 16.50s 60.62m numeric_upcast##cast_tinyint_as_real 2.83s 353.33m numeric_upcast##cast_tinyint_nullable_as_double 19.13s 52.26m numeric_upcast##cast_tinyint_as_double 8.07s 123.97m numeric_upcast##cast_smallint_nullable_as_integ 18.96s 52.76m numeric_upcast##cast_smallint_as_integer 1.78s 561.26m numeric_upcast##cast_smallint_nullable_as_bigin 18.72s 53.41m numeric_upcast##cast_smallint_as_bigint 4.38s 228.11m numeric_upcast##cast_smallint_nullable_as_real 16.34s 61.19m numeric_upcast##cast_smallint_as_real 1.71s 583.54m numeric_upcast##cast_smallint_nullable_as_doubl 18.65s 53.62m numeric_upcast##cast_smallint_as_double 2.85s 351.36m numeric_upcast##cast_integer_nullable_as_bigint 18.43s 54.26m numeric_upcast##cast_integer_as_bigint 3.47s 288.11m numeric_upcast##cast_integer_nullable_as_real 16.58s 60.32m numeric_upcast##cast_integer_as_real 2.41s 414.34m numeric_upcast##cast_integer_nullable_as_double 18.53s 53.97m numeric_upcast##cast_integer_as_double 3.34s 299.25m numeric_upcast##cast_bigint_nullable_as_real 16.82s 59.45m numeric_upcast##cast_bigint_as_real 4.23s 236.48m numeric_upcast##cast_bigint_nullable_as_double 19.15s 52.22m numeric_upcast##cast_bigint_as_double 4.56s 219.08m numeric_upcast##cast_real_nullable_as_double 18.35s 54.50m numeric_upcast##cast_real_as_double 3.43s 291.53m ---------------------------------------------------------------------------- ``` Replace: #15458 Pull Request resolved: #16967 Reviewed By: peterenescu Differential Revision: D99139839 Pulled By: bikramSingh91 fbshipit-source-id: ffb57fcd6bcab15e32b72deba2f53c2aea8ba102

Summary: When the row size is large (e.g., around 300,000,000), casting from a narrower integer type to a wider one—such as cast(integer as bigint)—can become time- consuming. This PR optimizes the numeric upcast by performing the cast directly on the raw values within loops, and drops the try-catch used for potential error handing. Since upcasts guarantee that the source value fits within the target type, overflow handling is unnecessary in this case. The performance gains are likely attributed to: 1) Eliminating try-catch blocks when error handling is unnecessary. 2) Improved auto-vectorization and lower function call overhead after replacing `valueAt` and `set` with direct access. 3) Avoiding overflow checks. Optimized conversions include: ``` CAST(tinyint AS smallint) CAST(tinyint AS integer) CAST(tinyint AS bigint) CAST(tinyint AS real) CAST(tinyint AS double) CAST(smallint AS integer) CAST(smallint AS bigint) CAST(smallint AS real) CAST(smallint AS double) CAST(integer AS bigint) CAST(integer AS real) CAST(integer AS double) CAST(bigint AS real) CAST(bigint AS double) CAST(real AS double) ``` Before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ numeric_upcast##cast_tinyint_nullable_as_smalli 57.89s 17.27m numeric_upcast##cast_tinyint_as_smallint 1.09min 15.32m numeric_upcast##cast_tinyint_nullable_as_intege 59.07s 16.93m numeric_upcast##cast_tinyint_as_integer 1.09min 15.25m numeric_upcast##cast_tinyint_nullable_as_bigint 1.03min 16.12m numeric_upcast##cast_tinyint_as_bigint 1.13min 14.71m numeric_upcast##cast_tinyint_nullable_as_real 1.87min 8.90m numeric_upcast##cast_tinyint_as_real 2.16min 7.71m numeric_upcast##cast_tinyint_nullable_as_double 1.79min 9.29m numeric_upcast##cast_tinyint_as_double 2.06min 8.10m numeric_upcast##cast_smallint_nullable_as_integ 59.30s 16.86m numeric_upcast##cast_smallint_as_integer 1.11min 15.01m numeric_upcast##cast_smallint_nullable_as_bigin 1.02min 16.29m numeric_upcast##cast_smallint_as_bigint 1.14min 14.59m numeric_upcast##cast_smallint_nullable_as_real 1.99min 8.37m numeric_upcast##cast_smallint_as_real 2.29min 7.26m numeric_upcast##cast_smallint_nullable_as_doubl 1.80min 9.28m numeric_upcast##cast_smallint_as_double 2.03min 8.23m numeric_upcast##cast_integer_nullable_as_bigint 1.03min 16.24m numeric_upcast##cast_integer_as_bigint 1.12min 14.89m numeric_upcast##cast_integer_nullable_as_real 1.40min 11.88m numeric_upcast##cast_integer_as_real 1.64min 10.15m numeric_upcast##cast_integer_nullable_as_double 1.44min 11.56m numeric_upcast##cast_integer_as_double 1.65min 10.09m numeric_upcast##cast_bigint_nullable_as_real 1.41min 11.78m numeric_upcast##cast_bigint_as_real 1.65min 10.12m numeric_upcast##cast_bigint_nullable_as_double 1.46min 11.44m numeric_upcast##cast_bigint_as_double 1.65min 10.09m numeric_upcast##cast_real_nullable_as_double 1.43min 11.64m numeric_upcast##cast_real_as_double 1.69min 9.85m ---------------------------------------------------------------------------- ``` After: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ numeric_upcast##cast_tinyint_nullable_as_smalli 15.12s 66.12m numeric_upcast##cast_tinyint_as_smallint 931.11ms 1.07 numeric_upcast##cast_tinyint_nullable_as_intege 16.61s 60.22m numeric_upcast##cast_tinyint_as_integer 2.21s 451.83m numeric_upcast##cast_tinyint_nullable_as_bigint 19.33s 51.73m numeric_upcast##cast_tinyint_as_bigint 4.32s 231.37m numeric_upcast##cast_tinyint_nullable_as_real 16.50s 60.62m numeric_upcast##cast_tinyint_as_real 2.83s 353.33m numeric_upcast##cast_tinyint_nullable_as_double 19.13s 52.26m numeric_upcast##cast_tinyint_as_double 8.07s 123.97m numeric_upcast##cast_smallint_nullable_as_integ 18.96s 52.76m numeric_upcast##cast_smallint_as_integer 1.78s 561.26m numeric_upcast##cast_smallint_nullable_as_bigin 18.72s 53.41m numeric_upcast##cast_smallint_as_bigint 4.38s 228.11m numeric_upcast##cast_smallint_nullable_as_real 16.34s 61.19m numeric_upcast##cast_smallint_as_real 1.71s 583.54m numeric_upcast##cast_smallint_nullable_as_doubl 18.65s 53.62m numeric_upcast##cast_smallint_as_double 2.85s 351.36m numeric_upcast##cast_integer_nullable_as_bigint 18.43s 54.26m numeric_upcast##cast_integer_as_bigint 3.47s 288.11m numeric_upcast##cast_integer_nullable_as_real 16.58s 60.32m numeric_upcast##cast_integer_as_real 2.41s 414.34m numeric_upcast##cast_integer_nullable_as_double 18.53s 53.97m numeric_upcast##cast_integer_as_double 3.34s 299.25m numeric_upcast##cast_bigint_nullable_as_real 16.82s 59.45m numeric_upcast##cast_bigint_as_real 4.23s 236.48m numeric_upcast##cast_bigint_nullable_as_double 19.15s 52.22m numeric_upcast##cast_bigint_as_double 4.56s 219.08m numeric_upcast##cast_real_nullable_as_double 18.35s 54.50m numeric_upcast##cast_real_as_double 3.43s 291.53m ---------------------------------------------------------------------------- ``` Replace: facebookincubator#15458 Pull Request resolved: facebookincubator#16967 Reviewed By: peterenescu Differential Revision: D99139839 Pulled By: bikramSingh91 fbshipit-source-id: ffb57fcd6bcab15e32b72deba2f53c2aea8ba102

rui-mo requested a review from majetideepak as a code owner November 11, 2025 08:17

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025

rui-mo changed the title ~~misc: Optimize integral upcast~~ misc: Optimize integral upcast when all selected Nov 11, 2025

rui-mo force-pushed the wip_int_as_bigint branch from bc58ca2 to bd6771f Compare November 11, 2025 09:12

MBkkt reviewed Nov 11, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp Outdated

MBkkt reviewed Nov 11, 2025

View reviewed changes

Comment thread velox/benchmarks/basic/IntegralUpcastBenchmark.cpp Outdated

MBkkt reviewed Nov 11, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp Outdated

MBkkt reviewed Nov 11, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp Outdated

rui-mo force-pushed the wip_int_as_bigint branch from bd6771f to 0eca183 Compare November 12, 2025 07:48

rui-mo changed the title ~~misc: Optimize integral upcast when all selected~~ misc: Optimize basic numeric upcast Nov 12, 2025

rui-mo force-pushed the wip_int_as_bigint branch from 0eca183 to 4133b2c Compare November 12, 2025 10:06

jinchengchenghh reviewed Nov 12, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp

rui-mo force-pushed the wip_int_as_bigint branch from 4133b2c to a1b248a Compare November 12, 2025 14:56

MBkkt reviewed Nov 12, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp Outdated

MBkkt reviewed Nov 12, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp Outdated

MBkkt reviewed Nov 12, 2025

View reviewed changes

rui-mo force-pushed the wip_int_as_bigint branch from a1b248a to 6ad6ade Compare November 13, 2025 12:43

jinchengchenghh reviewed Nov 13, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp Outdated

rui-mo force-pushed the wip_int_as_bigint branch 2 times, most recently from 600fb40 to 04e7cbb Compare November 13, 2025 15:11

MBkkt reviewed Nov 13, 2025

View reviewed changes

jinchengchenghh approved these changes Nov 13, 2025

View reviewed changes

rui-mo force-pushed the wip_int_as_bigint branch from 04e7cbb to 4068c77 Compare November 14, 2025 10:32

rui-mo mentioned this pull request Nov 14, 2025

Use status in cast to allow reporting error without throwing #15506

Open

jinchengchenghh reviewed Nov 14, 2025

View reviewed changes

Comment thread velox/expression/CastExpr.cpp

rui-mo force-pushed the wip_int_as_bigint branch from 4068c77 to 3b02501 Compare November 19, 2025 12:55

rui-mo force-pushed the wip_int_as_bigint branch from 3b02501 to 8f7b574 Compare January 21, 2026 09:54

rui-mo force-pushed the wip_int_as_bigint branch 2 times, most recently from bb061ed to 0c1a034 Compare February 10, 2026 09:57

rui-mo mentioned this pull request Feb 10, 2026

[VL] useful Velox PRs not merged into upstream apache/gluten#11585

Open

rui-mo changed the title ~~misc: Optimize basic numeric upcast~~ perf: Optimize basic numeric upcast Feb 10, 2026

Optimize numeric upcast

d5754a2

rui-mo force-pushed the wip_int_as_bigint branch from 0c1a034 to d5754a2 Compare February 18, 2026 10:51

rui-mo closed this by deleting the head repository Mar 30, 2026

rui-mo mentioned this pull request Mar 30, 2026

perf: Optimize basic numeric upcast #16967

Closed

Conversation

rui-mo commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

rui-mo commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MBkkt commented Nov 11, 2025

Uh oh!

jinchengchenghh commented Nov 12, 2025

Uh oh!

jinchengchenghh commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MBkkt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rui-mo commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MBkkt commented Nov 12, 2025

Uh oh!

Yuhta commented Nov 12, 2025

Uh oh!

rui-mo commented Nov 13, 2025

Uh oh!

Uh oh!

MBkkt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rui-mo commented Nov 11, 2025 •

edited

Loading

netlify Bot commented Nov 11, 2025 •

edited

Loading

MBkkt left a comment •

edited

Loading

rui-mo commented Nov 12, 2025 •

edited

Loading