Skip to content

chore: Integrate upstream Presto release-0.297-edge10 and adapt CLP connector.#147

Merged
20001020ycx merged 3 commits into
y-scope:release-0.297-edge-10-clp-connectorfrom
20001020ycx:2026-02-23-0.297-presto-integration
Mar 2, 2026
Merged

chore: Integrate upstream Presto release-0.297-edge10 and adapt CLP connector.#147
20001020ycx merged 3 commits into
y-scope:release-0.297-edge-10-clp-connectorfrom
20001020ycx:2026-02-23-0.297-presto-integration

Conversation

@20001020ycx

@20001020ycx 20001020ycx commented Feb 23, 2026

Copy link
Copy Markdown

Description

This PR updates Presto from 0.293 to upstream release-0.297-edge10 and adapts the CLP connector to remain compatible. release-0.297-edge10 is chosen because it provides 1) compatible fmt version with latest log surgeon and clp library which are the dependencies of the velox, and 2) it contains feature commits where we need for plugin our connectors, namely prestodb#24330 and prestodb#26257.

  • Commit 1 — pulls all changes from upstream prestodb/presto branch release-0.297-edge10 at commit d1aa0c30da8d53d4305315aeed0b8b0e6c99e5d5, with conflict resolution to retain our CLP-related files.

  • Commit 2 — adapts the CLP connector to the Presto 0.297 interface:

    • Extract ClpPrestoToVeloxConnector into its own dedicated file following the 0.297 pattern where the monolithic PrestoToVeloxConnector.cpp was split per-connector; the implementation is unchanged from its previous location in that file.
    • Update presto-clp/pom.xml version and fix airlift dependency to match 0.297.
    • Update MetadataManager constructor call to include the new MaterializedViewPropertyManager parameter added in 0.297.
  • Commit 3 — updates CI workflows and build configuration for 0.297 compatibility:

    • Drop Java 8 from all workflow matrices because we now target JDK 17.
    • presto-common/pom.xml: Remove --add-exports=java.base/sun.misc=ALL-UNNAMED because on JDK 17 it causes a fatal split-package error (sun.misc is exported by both java.base and jdk.unsupported). Later upstream patch with the same fix.
    • Add make velox-submodule step to e2e test jobs because these jobs run on a fresh checkout and need the velox submodule initialised to locate native library paths.
    • Restore format-check/header-check/format-fix/header-fix Makefile targets because upstream 0.297 removed them.
    • Update 0.297 refs across workflow files to release-0.297-edge-10-clp-connector, as it will be our new main branch.
    • Limit e2e tests to TestPrestoNativeClpGeneralQueries because the new upstream CTE tests introduced in 0.297-edge10 require plugins not present in our test environment.
    • clp-connector-unit-tests job: The existing maven-checks job uses -DskipTests so CLP unit tests were never run in CI. This ensures CI runs CLP unit test.
    • Remove non-CLP test jobs

Note for reviewers: Please only review 2nd and 3rd commit. The 2nd commit is relevant to the CLP Presto connector and warrants review, and the 3rd commit is fixing the breaking changes from ci workflow. The 1st commit is a merge commit from the upstream Presto open-source repository.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • All unit test passed
  • End to end test
    • Query: SELECT CLP_GET_JSON_STRING() from clp.default.default limit 100
    • Set up: official clp package v0.9.0, Presto with the changes in this PR, Velox with integration of upstream Presto's 0.297, and minio as the storage.
    • The result is returned as expected.

@coderabbitai

coderabbitai Bot commented Feb 23, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch 3 times, most recently from f80bd10 to 201ee5c Compare February 24, 2026 21:24
@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch from 201ee5c to f4ba649 Compare February 25, 2026 19:13
Pull all changes from upstream prestodb/presto branch release-0.297-edge10
at commit d1aa0c3.

Velox submodule pointer is retained from our fork (already upgraded
separately on branch 2026-02-23-0.297-integration).
@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch 3 times, most recently from 43f962b to 9a9bbb6 Compare February 25, 2026 20:37

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, just have some questions about parts I didn't understand.

Besides those questions, I'm also curious about why the PrestoToVeloxConnector class is needed now.

Comment thread .github/workflows/maven-checks.yml Outdated
Comment thread presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java Outdated
@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch from 9a9bbb6 to 70d4b33 Compare February 25, 2026 21:02
@20001020ycx

Copy link
Copy Markdown
Author

Besides those questions, I'm also curious about why the PrestoToVeloxConnector class is needed now.

In 0.293, all connectors (Hive, Iceberg, Tpch, and CLP) had their subclass implementations in the same monolithic PrestoToVeloxConnector.cpp. PrestoToVeloxConnector was a single concrete class with one big function containing if/else logic for each connector.

In 0.297, upstream refactored it into an abstract base class with pure virtual methods (= 0). There's no longer a single function to add a branch to — each connector must implement its own subclass. That's why ClpPrestoToVeloxConnector is required now.

@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch from 70d4b33 to d53942c Compare February 27, 2026 16:28
gibber9809
gibber9809 previously approved these changes Feb 27, 2026

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the changes in the second commit for adapting our code to 0.297.

@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch 13 times, most recently from 53cf846 to b7832df Compare March 1, 2026 05:22
@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch 3 times, most recently from 33096ca to 9d068e4 Compare March 1, 2026 17:27
@20001020ycx 20001020ycx changed the base branch from release-0.293-clp-connector to release-0.297-edge-10-clp-connector March 2, 2026 14:27
@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch 6 times, most recently from cae7d4b to 37c536e Compare March 2, 2026 15:57
@20001020ycx 20001020ycx requested a review from jackluo923 March 2, 2026 18:27
@20001020ycx 20001020ycx force-pushed the 2026-02-23-0.297-presto-integration branch 2 times, most recently from 2835eda to 2490fc9 Compare March 2, 2026 19:35
20001020ycx and others added 2 commits March 2, 2026 14:48
- Extract ClpPrestoToVeloxConnector into its own .h/.cpp files to match
  the per-connector split introduced in 0.297
- Update CMakeLists.txt and Registration.cpp to include the new file
- Update presto-clp pom.xml: bump version to 0.297, update groupId from
  com.facebook.presto to com.facebook.presto.clp
- Fix ClpSplitFilterProvider to use MetadataManager's new signature
  (rowExpression param added in 0.297)
- Fix ClpMetadataDbSetUp to use Paths.get() instead of new File()
  (required by modernizer-maven-plugin built-in rules)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Drop Java 8 from all workflow matrices (0.297 requires Java 11+);
  update maven-checks, tests, docs, prestocpp-linux-build-and-unit-test
- Update artifact paths and version strings from 0.293 to 0.297-edge10.1-SNAPSHOT
  in maven-checks
- Update all branch references from release-0.293-clp-connector to
  presto-0.293-clp-connector across all workflows
- Restore docker/Dockerfile to 0.293 structure (upstream added
  presto-function-server which is not built by our CI); update Java
  11 -> 17 to match build requirement
- Update .gitmodules for velox submodule
- Update velox submodule pointer to 0.297-compatible commit
- Remove --add-exports=java.base/sun.misc=ALL-UNNAMED from presto-common
  (causes split-package conflict on JDK 17 Corretto; sun.misc is
  accessible via jdk.unsupported without explicit export flags)
- Disable testMultipleIndependentPersistentCtes and
  testSimplePersistentCteMultipleUses: both are newly introduced in
  0.297-edge10 and fail due to missing native support (key_sampling_percent
  not implemented in Velox; $operator$hash_code not registered for BIGINT)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@20001020ycx 20001020ycx merged commit 8ab0c60 into y-scope:release-0.297-edge-10-clp-connector Mar 2, 2026
8 checks passed
@20001020ycx 20001020ycx deleted the 2026-02-23-0.297-presto-integration branch March 3, 2026 15:01
20001020ycx added a commit that referenced this pull request Mar 4, 2026
@junhaoliao

Copy link
Copy Markdown
Member

@coderabbitai this PR may have accidentally remove CA certs in the worker container images, which were added by #55 . As a result, S3 queries fail silently (queries report success without error / warning logs, but no row was returned). create an issue to track

@coderabbitai

coderabbitai Bot commented Mar 18, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me first look into PR #55 and the current state of the container image configurations before creating the issue.


🏁 Script executed:

#!/bin/bash
# Get details about PR `#55`
gh pr view 55 --repo y-scope/presto --json title,body,files

# Also get the current PR files to check for container image changes
gh pr view 147 --repo y-scope/presto --json files | jq '.files[] | select(.path | test("Dockerfile|docker|container|worker"; "i"))'

Length of output: 2971


🏁 Script executed:

#!/bin/bash
# Check current content of the dockerfile to see if ca-certificates is still present
cat presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile

# Also check git log for this file to understand history
git log --oneline presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | head -10

Length of output: 3031


The issue is confirmed. The upstream merge in commit f3b5eea1 overwrote presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile, removing the ca-certificates installation that was added by PR #55. The current final runtime stage has no package installations at all. Let me create the tracking issue now.


🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo y-scope/presto \
  --title "bug: CA certificates missing from worker runtime image after upstream 0.297 merge (PR `#147`)" \
  --body "## Summary

PR `#147` (upstream \`release-0.297-edge10\` merge) accidentally overwrote \`presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile\`, removing the \`ca-certificates\` package installation that was added in PR `#55`.

## Impact

Without \`ca-certificates\`, the Prestissimo worker cannot verify TLS certificates when accessing presigned S3 URLs. S3 queries fail silently — they report success with no error or warning logs, but return zero rows.

## Root Cause

The upstream merge commit (\`f3b5eea1\`) reset the Dockerfile to the upstream version, dropping the CLP-specific addition from commit \`b00627f0\` (PR `#55`).

## Steps to Reproduce

1. Build and run the worker image from the \`release-0.297-edge-10-clp-connector\` branch (PR `#147`).
2. Run a CLP query that reads from S3.
3. Query completes with no error, but returns no rows.

## Expected Fix

Re-apply the \`ca-certificates\` installation (and associated non-interactive apt/yum setup) to the final runtime stage of \`presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile\`, as was done in PR `#55`.

## References

- PR that introduced the regression: `#147`
- PR that originally added \`ca-certificates\`: `#55`
- Reported by: `@junhaoliao`" \
  --assignee junhaoliao

Length of output: 44


@junhaoliao I've confirmed the regression and created the tracking issue: #153

Here's a summary of what I found:

  • The current presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile final stage has no package installations at allca-certificates is completely absent.
  • The git log shows the upstream merge commit (f3b5eea1) directly overwrote the file, silently dropping the CLP-specific ca-certificates addition from PR #55 (b00627f0).
  • The fix is to re-apply the ca-certificates installation to the final runtime stage of the Dockerfile, as was done in PR #55.

🧠 Learnings used
Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Learnt from: jackluo923
Repo: y-scope/presto PR: 145
File: .github/workflows/integration-tests.yml:316-325
Timestamp: 2026-02-02T17:20:19.971Z
Learning: In the y-scope/presto repository, maintain consistency with upstream Presto's implementation patterns (not just formatting), including shell-based test discovery approaches in GitHub workflows, even when more robust alternatives exist, to minimize divergence and reduce risk of regressions when syncing with upstream changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants