test(native): Add comprehensive tests for Iceberg bucket function by Joe-Abraham · Pull Request #26581 · prestodb/presto

Joe-Abraham · 2025-11-11T11:00:51Z

Description

Updated the Iceberg bucket scalar function’s signature and return type to align with native execution semantics and Spark.
Introduced regression test cases for both presto and prestissimo.

Motivation and Context

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

sourcery-ai · 2025-11-26T08:23:31Z

Reviewer's Guide

Adjusts the Iceberg bucket scalar function signature and return type to match native execution semantics and adds comprehensive Java and native tests for various types and edge cases.

Class diagram for updated IcebergBucketFunction scalar bucket overloads

classDiagram
    class IcebergBucketFunction {
        <<final>>
        - IcebergBucketFunction()
        + bucketInteger(numberOfBuckets: int, value: long) long
        + bucketVarchar(numberOfBuckets: int, value: Slice) long
        + bucketVarbinary(numberOfBuckets: int, value: Slice) long
        + bucketDate(numberOfBuckets: int, value: long) long
        + bucketTimestamp(numberOfBuckets: int, value: long) long
        + bucketTimestampWithTimeZone(numberOfBuckets: int, value: long) long
    }

    class Bucket {
        <<static>>
        + bucketShortDecimal(numPrecision: long, numScale: long, numberOfBuckets: int, value: long) long
        + bucketLongDecimal(numPrecision: long, numScale: long, numberOfBuckets: int, value: Slice) long
    }

    IcebergBucketFunction *-- Bucket

File-Level Changes

Change	Details	Files
Changed Iceberg `bucket` scalar function signatures to take `number_of_buckets` as the first argument and return INTEGER instead of BIGINT, aligning with Iceberg/native expectations.	Reordered parameters of all `bucket` overloads so `numberOfBuckets` is the first argument and the value to bucket is second. Updated `@SqlType` annotations for all overloads from BIGINT to INTEGER to reflect the new return type. Added explicit casts from the Iceberg `Transforms.bucket` result to long while ensuring the logical result is an int-sized value. Updated decimal overloads in the nested `Bucket` class to use the new argument order and INTEGER return type.	`presto-iceberg/src/main/java/com/facebook/presto/iceberg/function/IcebergBucketFunction.java`
Updated existing JVM scalar function tests to use the new `bucket` signature and result type.	Switched test SQL invocations so the bucket count argument precedes the value argument. Changed expected return type assertions from BIGINT to INTEGER and cast helper method results accordingly. Adjusted calls to decimal helper methods to match the new parameter order and INTEGER return type.	`presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergScalarFunctions.java`
Added native execution regression tests for Iceberg `bucket` covering type matrix, edge values, consistency, and use in queries over tables.	Introduced a new native test class wired with Java and native Iceberg `QueryRunner`s using the default Iceberg storage format. Added coverage for bucket on tinyint/smallint/int/bigint, short and long decimals, varchar/varbinary, date, timestamp, and timestamp with time zone, including negative, zero, and boundary values. Added idempotence/consistency checks to ensure repeated bucket calls with the same inputs yield identical results. Added queries applying `bucket` to various test tables/columns to exercise integration in realistic query shapes and to compare native vs JVM results via `assertQuery`.	`presto-native-execution/src/test/java/com/facebook/presto/nativeworker/iceberg/TestBucketFunction.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

The updated test calls for bucketShortDecimal and bucketLongDecimal appear to have precision/scale parameters swapped (e.g., bucketShortDecimal(12, 5, 12, 45643) for DECIMAL(5,2)), which no longer match the DECIMAL(p, s) types used in the SQL and likely need to be reordered to (p, s, numberOfBuckets, value).
Now that the SQL type of the bucket functions is INTEGER, consider changing the Java return type to int (or avoiding unnecessary (long) casts) to better reflect the underlying type and avoid confusion about the actual range of values.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The updated test calls for `bucketShortDecimal` and `bucketLongDecimal` appear to have precision/scale parameters swapped (e.g., `bucketShortDecimal(12, 5, 12, 45643)` for `DECIMAL(5,2)`), which no longer match the `DECIMAL(p, s)` types used in the SQL and likely need to be reordered to `(p, s, numberOfBuckets, value)`.
- Now that the SQL type of the bucket functions is `INTEGER`, consider changing the Java return type to `int` (or avoiding unnecessary `(long)` casts) to better reflect the underlying type and avoid confusion about the actual range of values.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/function/IcebergBucketFunction.java:38` </location>
<code_context>
-    @SqlType(StandardTypes.BIGINT)
-    public static long bucketInteger(@SqlType(StandardTypes.BIGINT) long value, @SqlType(StandardTypes.INTEGER) long numberOfBuckets)
+    @SqlType(StandardTypes.INTEGER)
+    public static long bucketInteger(@SqlType(StandardTypes.INTEGER) long numberOfBuckets, @SqlType(StandardTypes.BIGINT) long value)
     {
-        return Transforms.bucket((int) numberOfBuckets)
</code_context>

<issue_to_address>
**issue (bug_risk):** Swapping parameter order is a breaking change for existing SQL calls to `bucket`.

Changing the signature from `bucket(value, numberOfBuckets)` to `bucket(numberOfBuckets, value)` plus the new `@SqlType(StandardTypes.INTEGER)` on the first parameter makes the SQL signature `(integer, <type>)` instead of `(<type>, integer)`. Existing queries like `bucket(col, 16)` will no longer resolve because that overload no longer exists. If this break is intentional, it needs to be treated and communicated as a SQL-level breaking change; otherwise, keep the original parameter order and adjust only the return type if needed.
</issue_to_address>

### Comment 2
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/function/IcebergBucketFunction.java:37-46` </location>
<code_context>
     @ScalarFunction("bucket")
-    @SqlType(StandardTypes.BIGINT)
-    public static long bucketVarchar(@SqlType(StandardTypes.VARCHAR) Slice value, @SqlType(StandardTypes.INTEGER) long numberOfBuckets)
+    @SqlType(StandardTypes.INTEGER)
+    public static long bucketVarchar(@SqlType(StandardTypes.INTEGER) long numberOfBuckets, @SqlType(StandardTypes.VARCHAR) Slice value)
     {
         return (long) Transforms.bucket((int) numberOfBuckets)
</code_context>

<issue_to_address>
**issue (bug_risk):** Changing the SQL return type from BIGINT to INTEGER alters the observable contract of `bucket`.

This SQL-level type change can break existing queries, views, or integrations that rely on `bucket` returning BIGINT (e.g., joins, casts, or UDFs typed as BIGINT). If the goal is to match Iceberg’s bucket spec, we still need a clear compatibility/migration plan; otherwise, keep the SQL return type as BIGINT and only narrow it where we can guarantee safety.
</issue_to_address>

### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergScalarFunctions.java:70` </location>
<code_context>
+        functionAssertions.assertFunction(catalogSchema + ".bucket(4, cast(1950 as smallint))", INTEGER, (int) bucketInteger(4, 1950));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(5, cast(2375645 as int))", INTEGER, (int) bucketInteger(5, 2375645));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(6, cast(2779099983928392323 as BIGINT))", INTEGER, (int) bucketInteger(6, 2779099983928392323L));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(12, cast(456.43 as DECIMAL(5,2)))", INTEGER, (int) bucketShortDecimal(12, 5, 12, 45643));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(12, cast('12345678901234567890.1234567890' as DECIMAL(30,10)))", INTEGER, (int) bucketLongDecimal(10, 30, 12, encodeScaledValue(new BigDecimal("12345678901234567890.1234567890"))));

</code_context>

<issue_to_address>
**issue (testing):** Decimal test passes bucket and precision/scale arguments to `bucketShortDecimal` in the wrong order, so it no longer matches the SQL call being tested.

`bucketShortDecimal` is now `(numPrecision, numScale, numberOfBuckets, value)`, but the test calls `bucketShortDecimal(12, 5, 12, 45643)`. For `DECIMAL(5,2)` with `numberOfBuckets = 12`, this should be `(5, 2, 12, 45643)`. As written, the test asserts against the wrong configuration. Please update the helper call to use the correct argument order and values so it matches the SQL expression.
</issue_to_address>

### Comment 4
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergScalarFunctions.java:71` </location>
<code_context>
+        functionAssertions.assertFunction(catalogSchema + ".bucket(5, cast(2375645 as int))", INTEGER, (int) bucketInteger(5, 2375645));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(6, cast(2779099983928392323 as BIGINT))", INTEGER, (int) bucketInteger(6, 2779099983928392323L));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(12, cast(456.43 as DECIMAL(5,2)))", INTEGER, (int) bucketShortDecimal(12, 5, 12, 45643));
+        functionAssertions.assertFunction(catalogSchema + ".bucket(12, cast('12345678901234567890.1234567890' as DECIMAL(30,10)))", INTEGER, (int) bucketLongDecimal(10, 30, 12, encodeScaledValue(new BigDecimal("12345678901234567890.1234567890"))));

-        functionAssertions.assertFunction(catalogSchema + ".bucket(cast('nasdbsdnsdms' as varchar), 7)", BIGINT, bucketVarchar(utf8Slice("nasdbsdnsdms"), 7));
</code_context>

<issue_to_address>
**issue (testing):** Long-decimal test also swaps precision and scale when calling `bucketLongDecimal`, so the expected value may not reflect the actual SQL behavior.

In the `DECIMAL(30,10)` test, `bucketLongDecimal` expects `(numPrecision, numScale, numberOfBuckets, value)`, but it’s called as `bucketLongDecimal(10, 30, 12, ...)` while the SQL uses `DECIMAL(30,10)`. This means precision and scale are swapped, so the test no longer matches the SQL contract for `bucket`. Please correct the call to `bucketLongDecimal(30, 10, 12, ...)` so the helper aligns with the SQL expression.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-11-26T08:23:32Z

-    @SqlType(StandardTypes.BIGINT)
-    public static long bucketInteger(@SqlType(StandardTypes.BIGINT) long value, @SqlType(StandardTypes.INTEGER) long numberOfBuckets)
+    @SqlType(StandardTypes.INTEGER)
+    public static long bucketInteger(@SqlType(StandardTypes.INTEGER) long numberOfBuckets, @SqlType(StandardTypes.BIGINT) long value)


issue (bug_risk): Swapping parameter order is a breaking change for existing SQL calls to bucket.

Changing the signature from bucket(value, numberOfBuckets) to bucket(numberOfBuckets, value) plus the new @SqlType(StandardTypes.INTEGER) on the first parameter makes the SQL signature (integer, <type>) instead of (<type>, integer). Existing queries like bucket(col, 16) will no longer resolve because that overload no longer exists. If this break is intentional, it needs to be treated and communicated as a SQL-level breaking change; otherwise, keep the original parameter order and adjust only the return type if needed.

I agree with sourcery-ai. This is a breaking change.
The existing supported DDL when creating partition table is

partitioning=ARRAY['bucket(c_int,10)', 'bucket(c_bigint,13)'])

I think we should follow this pattern.

hantangwangd

@Joe-Abraham thanks for this change. Following Iceberg's specifications and Spark's syntax, I agree with you that using bucket(n, value) is the correct way because, as defined by specification, n is used to parameterize the bucket transform as bucket_n, rather than simply serving as a function argument. But Trino appears to use truncate(value, n).

Besides, considering our existing syntax for specifying partition transforms when creating an Iceberg table—bucket(value, n)/truncate(value, n)—I'm wondering whether we should apply the same adjustment as well. This would be a breaking change from a user-facing SQL syntax perspective. What are your thoughts on this? @tdcmeehan @Joe-Abraham

PingLiuPing · 2026-01-07T10:17:08Z

+import static com.facebook.presto.nativeworker.PrestoNativeQueryRunnerUtils.javaIcebergQueryRunnerBuilder;
+import static com.facebook.presto.nativeworker.PrestoNativeQueryRunnerUtils.nativeIcebergQueryRunnerBuilder;
+
+public class TestBucketFunction


What's the purpose of these testcases?
It seems it just verifying the existence of iceberg.system.bucket, should we verify the bucket result too?

When running query with native query runner, which bucket function is been called, the Java version or native version?

PingLiuPing · 2026-01-07T10:20:10Z

-    @SqlType(StandardTypes.BIGINT)
-    public static long bucketInteger(@SqlType(StandardTypes.BIGINT) long value, @SqlType(StandardTypes.INTEGER) long numberOfBuckets)
+    @SqlType(StandardTypes.INTEGER)
+    public static long bucketInteger(@SqlType(StandardTypes.INTEGER) long numberOfBuckets, @SqlType(StandardTypes.BIGINT) long value)


I agree with sourcery-ai. This is a breaking change.
The existing supported DDL when creating partition table is

partitioning=ARRAY['bucket(c_int,10)', 'bucket(c_bigint,13)'])

I think we should follow this pattern.

prestodb-ci added the from:IBM PR from IBM label Nov 11, 2025

Joe-Abraham force-pushed the bucket branch from 4aed053 to d658271 Compare November 11, 2025 11:04

Joe-Abraham changed the title ~~feat(iceberg): Update bucket function to return INTEGER type and add …~~ feat(native): Register Iceberg bucket and regression Nov 11, 2025

Joe-Abraham force-pushed the bucket branch from d658271 to f7e7ccc Compare November 11, 2025 14:07

czentgr mentioned this pull request Nov 12, 2025

chore(ci): Advance velox #26584

Merged

Joe-Abraham force-pushed the bucket branch 3 times, most recently from 41d5180 to fe7c2b2 Compare November 26, 2025 08:09

Joe-Abraham changed the title ~~feat(native): Register Iceberg bucket and regression~~ test(native): Add comprehensive tests for Iceberg bucket function Nov 26, 2025

Joe-Abraham force-pushed the bucket branch from fe7c2b2 to 1a0600e Compare November 26, 2025 08:14

prestodb deleted a comment from sourcery-ai Bot Nov 26, 2025

Joe-Abraham marked this pull request as ready for review November 26, 2025 08:22

Joe-Abraham requested review from a team, ZacBlanco and hantangwangd as code owners November 26, 2025 08:22

prestodb-ci requested review from a team, jkhaliqi and pdabre12 and removed request for a team November 26, 2025 08:22

Joe-Abraham requested a review from nmahadevuni November 26, 2025 08:23

sourcery-ai Bot reviewed Nov 26, 2025

View reviewed changes

Joe-Abraham linked an issue Nov 26, 2025 that may be closed by this pull request

[native] Register Iceberg bucket scalar function in system schema #26558

Open

test(native): Add comprehensive tests for Iceberg bucket function

22fa727

Joe-Abraham force-pushed the bucket branch from 1a0600e to 22fa727 Compare November 28, 2025 06:45

jkhaliqi approved these changes Dec 5, 2025

View reviewed changes

hantangwangd reviewed Dec 13, 2025

View reviewed changes

PingLiuPing reviewed Jan 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(native): Add comprehensive tests for Iceberg bucket function#26581

test(native): Add comprehensive tests for Iceberg bucket function#26581
Joe-Abraham wants to merge 1 commit into
prestodb:masterfrom
Joe-Abraham:bucket

Joe-Abraham commented Nov 11, 2025 •

edited

Loading

Uh oh!

sourcery-ai Bot commented Nov 26, 2025

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Nov 26, 2025

Uh oh!

PingLiuPing Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

hantangwangd left a comment

Uh oh!

PingLiuPing Jan 7, 2026

Uh oh!

PingLiuPing Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Joe-Abraham commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Contributor checklist

Release Notes

Uh oh!

sourcery-ai Bot commented Nov 26, 2025

Reviewer's Guide

Class diagram for updated IcebergBucketFunction scalar bucket overloads

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

PingLiuPing Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hantangwangd left a comment

Choose a reason for hiding this comment

Uh oh!

PingLiuPing Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

PingLiuPing Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Joe-Abraham commented Nov 11, 2025 •

edited

Loading