Skip to content

chore: Add ca-certificates to the worker's runtime image to support reading signed S3 URLs in the CLP connector (fixes #56).#55

Merged
anlowee merged 3 commits into
y-scope:release-0.293-clp-connectorfrom
anlowee:xwei/add-ca-certificates
Aug 19, 2025
Merged

chore: Add ca-certificates to the worker's runtime image to support reading signed S3 URLs in the CLP connector (fixes #56).#55
anlowee merged 3 commits into
y-scope:release-0.293-clp-connectorfrom
anlowee:xwei/add-ca-certificates

Conversation

@anlowee

@anlowee anlowee commented Aug 18, 2025

Copy link
Copy Markdown

Description

CLP S3 support needs this to access the presign URL.

Change the source image in CLP's tools/deployment/presto-clp/docker-compose.yml from y-scope to anlowee and it works.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Download the image and E2E test w/ and w/o S3 CLP package.

Summary by CodeRabbit

  • Bug Fixes

    • Restored reliable access to HTTPS resources (including presigned S3 URLs) by bundling system CA certificates in the runtime image, preventing TLS verification failures.
  • Chores

    • Streamlined runtime image setup with non-interactive package installation (including timezone data) and cleanup to keep the image lean and predictable.

@coderabbitai

coderabbitai Bot commented Aug 18, 2025

Copy link
Copy Markdown

Walkthrough

Installs ca-certificates alongside tzdata in the runtime stage of prestissimo-runtime.dockerfile, consolidating apt update/install/cleanup into a single non-interactive RUN step and replacing a TODO comment with a NOTE explaining the need for ca-certificates and the tzdata workaround.

Changes

Cohort / File(s) Summary of changes
Runtime Dockerfile update
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
Added ca-certificates to the apt install list alongside tzdata; replaced a TODO with a NOTE documenting why ca-certificates are required for signed S3 URL reads and tzdata is used as a workaround; consolidated apt-get update, install and rm -rf /var/lib/apt/lists/* into a single RUN command.

Sequence Diagram(s)

(omitted — changes are a simple Dockerfile package installation and do not affect runtime control flow)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@anlowee anlowee changed the title docker: Add ca-certificates dependency of the worker's runtime image. docker: Add ca-certificates dependency of the worker's runtime image to support CLP S3. Aug 18, 2025
@anlowee anlowee changed the title docker: Add ca-certificates dependency of the worker's runtime image to support CLP S3. chore: Add ca-certificates dependency of the worker's runtime image to support CLP S3. Aug 18, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 32e6b12 and f19f0bc.

📒 Files selected for processing (1)
  • presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1 hunks)
🧰 Additional context used
🪛 Hadolint (2.12.0)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile

[info] 50-50: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 50-50: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

Comment on lines +49 to +53
# CLP S3 support currently needs ca-certificates package to access the presigned URL
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

If keeping a Debian/Ubuntu base, align with hadolint: use --no-install-recommends and consider pinning

Should you intentionally use a Debian/Ubuntu runtime image, incorporate minimal install flags and (optionally) version pinning to satisfy DL3015/DL3008 and reduce image size.

Minimal change:

-RUN apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates && \
+RUN apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y ca-certificates && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/*

Optional: pin the package version if you need full reproducibility (at the cost of more maintenance):

# Example only; verify the exact version available in your base image repos.
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y ca-certificates=20240203
🧰 Tools
🪛 Hadolint (2.12.0)

[info] 50-50: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 50-50: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

🤖 Prompt for AI Agents
In presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
around lines 49-53, the apt-get install omits --no-install-recommends (and
optionally a version pin) which triggers hadolint DL3015/DL3008 and increases
image size; change the install line to use DEBIAN_FRONTEND=noninteractive
apt-get install --no-install-recommends -y ca-certificates and keep the existing
apt-get clean and rm -rf /var/lib/apt/lists/*, and if you require strict
reproducibility optionally pin the ca-certificates package to a specific known
version available in the base image repos.

🧹 Nitpick (assertive)

Reduce layers and duplicate metadata downloads by combining the two package steps

You have two separate package-install RUNs (tzdata and ca-certificates). Combining them into a single RUN reduces layers and avoids repeating metadata downloads.

For CentOS base (recommended for your default BASE_IMAGE):

RUN dnf -y install tzdata ca-certificates && \
    dnf clean all && \
    rm -rf /var/cache/dnf

If you intentionally use Debian/Ubuntu:

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y tzdata ca-certificates && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
🧰 Tools
🪛 Hadolint (2.12.0)

[info] 50-50: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 50-50: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

🤖 Prompt for AI Agents
In presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
around lines 49 to 53, there are separate RUN steps installing tzdata and
ca-certificates which duplicate package metadata downloads and create extra
layers; combine those installs into a single RUN so package lists are fetched
once and both packages are installed together, then clean caches in the same
layer (use apt-get update then DEBIAN_FRONTEND=noninteractive apt-get install
--no-install-recommends -y tzdata ca-certificates && apt-get clean && rm -rf
/var/lib/apt/lists/*), or if the base image is CentOS use a single dnf install
line and clean dnf cache in the same RUN.

💡 Verification agent

🧩 Analysis chain

Switch to dnf/microdnf: current apt-get step will fail with CentOS Stream 9 base

Your BASE_IMAGE defaults to CentOS Stream 9 (Line 14), which does not provide apt-get. This RUN will fail unless BASE_IMAGE is overridden to a Debian/Ubuntu image at build time.

If you intend to keep CentOS Stream 9 as the default, replace apt with dnf/microdnf. Suggested change within this hunk:

-RUN apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
+RUN dnf -y install ca-certificates && \
+    dnf clean all && \
+    rm -rf /var/cache/dnf

And for consistency, you should also update the earlier tzdata step (outside this hunk) to use dnf on CentOS:

# Replace Lines 44-47 with:
RUN dnf -y install tzdata && \
    dnf clean all && \
    rm -rf /var/cache/dnf

If, instead, CI/build scripts override BASE_IMAGE to a Debian/Ubuntu image, please confirm. You can run this script to search for such overrides and docker build args:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Searching for BASE_IMAGE overrides and docker build invocations…"
rg -n -C3 -i 'BASE_IMAGE' || true
rg -n -C3 -i 'prestissimo-runtime\.dockerfile' || true
rg -n -C3 -i 'docker\s+build(x)?\b.*(prestissimo-runtime\.dockerfile|--build-arg\s+BASE_IMAGE)' || true

Length of output: 14741


Switch package manager to DNF for CentOS Stream 9 base
The default BASE_IMAGE=quay.io/centos/centos:stream9 in prestissimo-runtime.dockerfile doesn’t include apt-get, so the current install steps will fail in CI (Jenkins builds don’t override the base). Please update the two install blocks as follows:

• In scripts/dockerfiles/prestissimo-runtime.dockerfile (around line 49):

-RUN apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
+RUN dnf -y install ca-certificates && \
+    dnf clean all && \
+    rm -rf /var/cache/dnf

• And replace the earlier tzdata step (around line 44):

-RUN apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get install -y tzdata && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
+RUN dnf -y install tzdata && \
+    dnf clean all && \
+    rm -rf /var/cache/dnf

Note: Maven-plugin and Docker Compose builds explicitly override BASE_IMAGE=ubuntu:22.04, so apt-get remains valid there; this change ensures the CentOS Stream 9 default works too.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# CLP S3 support currently needs ca-certificates package to access the presigned URL
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Replace the tzdata install step with DNF
RUN dnf -y install tzdata && \
dnf clean all && \
rm -rf /var/cache/dnf
# CLP S3 support currently needs ca-certificates package to access the presigned URL
RUN dnf -y install ca-certificates && \
dnf clean all && \
rm -rf /var/cache/dnf
🧰 Tools
🪛 Hadolint (2.12.0)

[info] 50-50: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 50-50: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also open an issue describing the problem? That way if we have similar issues in the future or we need to justify this solution to the upstream devs, we can look back at the issue.

apt-get clean && \
rm -rf /var/lib/apt/lists/*

# CLP S3 support currently needs ca-certificates package to access the presigned URL

@kirkrodrigues kirkrodrigues Aug 19, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about replacing this and the previous block with:

# NOTE:
# - We need `ca-certificates` to support reads from signed S3 URLs.
# - We need `tzdata` as a workaround temporary for https://github.com/prestodb/presto/issues/25531
RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    ca-certificates \
    tzdata \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

@anlowee

anlowee commented Aug 19, 2025

Copy link
Copy Markdown
Author

Can you also open an issue describing the problem? That way if we have similar issues in the future or we need to justify this solution to the upstream devs, we can look back at the issue.

Do you mean open an issue in OSS repo or our fork?

@kirkrodrigues

Copy link
Copy Markdown
Member

issue

I mean open an issue in our fork.

Eventually, when we try to PR the CLP connector to the OSS repo, we can refer back to the issue in our fork.

@anlowee anlowee requested a review from kirkrodrigues August 19, 2025 17:24

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

chore: Add `ca-certificates` to the worker's runtime image to support reading signed S3 URLs in the CLP connector.

Comment thread presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile Outdated
@anlowee anlowee changed the title chore: Add ca-certificates dependency of the worker's runtime image to support CLP S3. chore: Add ca-certificates to the worker's runtime image to support reading signed S3 URLs in the CLP connector. Aug 19, 2025
…e.dockerfile

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@anlowee anlowee requested a review from kirkrodrigues August 19, 2025 18:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1)

45-50: Use dnf (or detect the package manager) — apt-get will fail with the CentOS Stream 9 default base

The default BASE_IMAGE is quay.io/centos/centos:stream9, which doesn’t provide apt-get. This RUN will fail unless the base is overridden to Debian/Ubuntu at build time. Make the step portable by detecting the package manager (apt vs dnf), or switch to dnf for the CentOS default.

Apply this diff to make the install step work across both Debian/Ubuntu and CentOS Stream 9:

-RUN apt-get update \
-    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
-    ca-certificates \
-    tzdata \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
+RUN set -eux; \
+  if command -v apt-get >/dev/null 2>&1; then \
+    apt-get update; \
+    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
+      ca-certificates \
+      tzdata; \
+    apt-get clean; \
+    rm -rf /var/lib/apt/lists/*; \
+  elif command -v dnf >/dev/null 2>&1; then \
+    dnf -y install --setopt=install_weak_deps=False \
+      ca-certificates \
+      tzdata; \
+    dnf clean all; \
+    rm -rf /var/cache/dnf; \
+  else \
+    echo "Unsupported base image (no apt-get or dnf)"; exit 1; \
+  fi

If you prefer to keep separate images per OS family, alternatively replace this block with a dnf-only install for the CentOS default and rely on a different Dockerfile when BASE_IMAGE is Ubuntu.

Run this script to verify whether CI/build tooling overrides BASE_IMAGE to a Debian/Ubuntu image (in which case apt-get would work today, but the CentOS default would still be broken):

#!/bin/bash
set -euo pipefail
echo "Searching for BASE_IMAGE overrides and docker build invocations…"
rg -n -C3 -i '\bBASE_IMAGE\b'
rg -n -C3 -i 'prestissimo-runtime\.dockerfile'
rg -n -C3 -i 'docker\s+build(x)?\b.*(prestissimo-runtime\.dockerfile|--build-arg\s+BASE_IMAGE)'
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b306071 and ada065a.

📒 Files selected for processing (1)
  • presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1 hunks)
🧰 Additional context used
🪛 Hadolint (2.12.0)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile

[info] 45-45: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 45-45: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1)

42-44: Good rationale in the NOTE block

Clear explanation of why ca-certificates and tzdata are present; thanks for capturing the Presto issue context.

Comment on lines +45 to +50
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
ca-certificates \
tzdata \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Hadolint: add --no-install-recommends (apt) and consider version pinning (optional)

  • For Debian/Ubuntu, prefer --no-install-recommends to reduce image size (DL3015). Included in the portable snippet above.
  • Optional: pin package versions for full reproducibility (DL3008). Only do this if you’re prepared to maintain version bumps; otherwise, leave unpinned.
🧰 Tools
🪛 Hadolint (2.12.0)

[info] 45-45: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 45-45: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

🤖 Prompt for AI Agents
In presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
around lines 45 to 50, the apt-get install invocation should add
--no-install-recommends to avoid pulling recommended packages and reduce image
size; update the command to call apt-get install -y --no-install-recommends
ca-certificates tzdata while keeping DEBIAN_FRONTEND=noninteractive, apt-get
clean and removal of /var/lib/apt/lists/*, and optionally pin package versions
(e.g. ca-certificates=<version> tzdata=<version>) only if you are prepared to
maintain them for reproducibility.

@kirkrodrigues kirkrodrigues changed the title chore: Add ca-certificates to the worker's runtime image to support reading signed S3 URLs in the CLP connector. chore: Add ca-certificates to the worker's runtime image to support reading signed S3 URLs in the CLP connector (fixes #56). Aug 19, 2025

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the issue it fixes to the PR's title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants