Skip to content

fix: Restore ca-certificates and tzdata in the worker runtime image (fixes #153).#154

Merged
junhaoliao merged 1 commit into
y-scope:release-0.297-edge-10-clp-connectorfrom
junhaoliao:add-ca-cert
Mar 18, 2026
Merged

fix: Restore ca-certificates and tzdata in the worker runtime image (fixes #153).#154
junhaoliao merged 1 commit into
y-scope:release-0.297-edge-10-clp-connectorfrom
junhaoliao:add-ca-cert

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Mar 18, 2026

Copy link
Copy Markdown
Member

Description

Problem

PR #147 (upstream release-0.297-edge10 merge) overwrote
prestissimo-runtime.dockerfile, dropping the ca-certificates and tzdata
packages originally added in PR #55. Without ca-certificates, the Prestissimo
worker cannot verify TLS certificates when accessing presigned S3 URLs — queries
complete with no error but return zero rows. Without tzdata, the worker hits
the timezone issue described in
prestodb#25531.

Root cause

The CI workflow builds the runtime image with BASE_IMAGE=ubuntu:22.04:

The bare ubuntu:22.04 image does not ship with ca-certificates or
tzdata, so the runtime image had no CA trust store (/etc/ssl/certs/ was
empty) and no timezone data. The Velox/libcurl S3 client looks for certs at
/etc/ssl/certs/ and silently fails TLS verification when the directory is
missing.

Fix

Auto-detect apt-get at build time and install ca-certificates and tzdata
if present. CentOS Stream 9 already bundles both, so only Debian/Ubuntu base
images need the explicit install. This avoids coupling to the OSNAME ARG,
which can be overridden by CI or docker build --build-arg independently of
BASE_IMAGE.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

1. Confirm the published image is missing ca-certificates and tzdata

Task: Verify that the existing worker image (clp-v0.10.0) has no SSL
certificate or timezone files, confirming the regression.

Command:

docker run --rm --entrypoint sh \
  ghcr.io/y-scope/presto/prestissimo-worker:clp-v0.10.0 \
  -c "find / -maxdepth 4 -type f 2>/dev/null" | grep -E 'ssl|cert|pki|zoneinfo'

Output:

(no output)

Explanation: The published image has zero SSL certificate files and no
timezone data. This confirms the regression from PR #147.

2. Build the full runtime image and verify both packages are installed

Task: Build the full prestissimo-runtime image using the same build args
as CI and confirm both ca-certificates and tzdata are present.

Command:

docker build \
  --build-arg BASE_IMAGE=ubuntu:22.04 \
  --build-arg DEPENDENCY_IMAGE=ghcr.io/y-scope/presto/prestissimo-worker-dev-env:dev \
  --build-arg "EXTRA_CMAKE_FLAGS=-DPRESTO_ENABLE_TESTING=OFF -DPRESTO_ENABLE_PARQUET=ON \
-DPRESTO_ENABLE_S3=ON -DTREAT_WARNINGS_AS_ERRORS=0" \
  --build-arg NUM_THREADS=32 \
  --build-arg OSNAME=ubuntu \
  -f ./presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile \
  -t prestissimo-runtime:local \
  ./presto-native-execution

Verification:

$ docker run --rm --entrypoint="" prestissimo-runtime:local dpkg -l ca-certificates tzdata
||/ Name            Version                Architecture Description
+++-===============-======================-============-=======================================
ii  ca-certificates 20240203~22.04.1       all          Common CA certificates
ii  tzdata          2025b-0ubuntu0.22.04.1 all          time zone and daylight-saving time data

$ docker run --rm --entrypoint="" prestissimo-runtime:local \
    sh -c "find / -maxdepth 4 -type f 2>/dev/null" | grep -E 'ssl|cert|pki|zoneinfo'
/etc/ca-certificates.conf
/etc/ssl/openssl.cnf
/etc/ssl/certs/ca-certificates.crt
/usr/share/zoneinfo/MST
/usr/share/zoneinfo/CET
...

Explanation: The full multi-stage build completes successfully — stage 1
compiles presto_server from source using the dependency image, and stage 2
produces the runtime image with ca-certificates and tzdata installed. Both
packages are confirmed present via dpkg -l, and the actual certificate
(/etc/ssl/certs/ca-certificates.crt) and timezone files (/usr/share/zoneinfo/)
are present on disk.

3. E2E Testing Infra

Check out y-scope/clp#2004 and apply below patch

Index: tools/deployment/package-helm/set-up-multi-dedicated-test.sh
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tools/deployment/package-helm/set-up-multi-dedicated-test.sh b/tools/deployment/package-helm/set-up-multi-dedicated-test.sh
--- a/tools/deployment/package-helm/set-up-multi-dedicated-test.sh	(revision 8c5f6d87f0467faa5de2d4e1d1f60a641178e4db)
+++ b/tools/deployment/package-helm/set-up-multi-dedicated-test.sh	(date 1773875348496)
@@ -114,6 +114,9 @@
     path: ${shared_data_dir}/streams
 EOF
 
+echo "Loading prestissimo-runtime:local into kind cluster..."
+kind load docker-image prestissimo-runtime:local --name "${CLUSTER_NAME}"
+
 echo "Installing Helm chart..."
 helm uninstall test --ignore-not-found
 sleep 2
@@ -129,6 +132,9 @@
     --set "reducer.scheduling.nodeSelector.yscope\.io/nodeType=query" \
     --set "prestoWorker.replicas=${PRESTO_WORKER_REPLICAS}" \
     --set "prestoWorker.scheduling.nodeSelector.yscope\.io/nodeType=presto" \
+    --set "image.prestoWorker.repository=prestissimo-runtime" \
+    --set "image.prestoWorker.tag=local" \
+    --set "image.prestoWorker.pullPolicy=Never" \
     $(get_presto_helm_args) \
     $(get_image_helm_args "${CLUSTER_NAME}" "${CLP_PACKAGE_IMAGE}")

then configure tools/deployment/package-helm/values.yaml:

  # Where archives should be output to
  archive_output:
    storage:
      type: "s3"
      s3_config:
        aws_authentication:
          type: "credentials"
          credentials:
            access_key_id: "<redacted>"
            secret_access_key: "<redacted>"
        region_code: "<redacted>"
        bucket: "<redacted>"
        key_prefix: "archives/"

  stream_output:
    storage:
      type: "s3"
      s3_config:
        aws_authentication:
          type: "credentials"
          credentials:
            access_key_id: "<redacted>"
            secret_access_key: "<redacted>"
        region_code: "<redacted>"
        bucket: "<redacted>"
        key_prefix: "streams/"

then run

cd ~/workspace/6-clp/tools/deployment/package-helm
./set-up-multi-dedicated-test.sh --presto

and observed

...
All jobs completed and services are ready.

Create a compression job in the webui http://localhost:30000/ingest with /home/junhao/samples/postgresql.jsonl

then perform a search on http://localhost:30000/search

then observed results were returned:
image

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened AWS S3 integration reliability by ensuring CA certificate validation is available in runtime environments
    • Enhanced timezone data support in production deployments for improved timestamp accuracy

@coderabbitai

coderabbitai Bot commented Mar 18, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f7aff828-e54c-4ea3-b2fb-f19e1dc85cd1

📥 Commits

Reviewing files that changed from the base of the PR and between ba677bc and b35779f.

📒 Files selected for processing (1)
  • presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile

📝 Walkthrough

Walkthrough

The change adds a conditional block to the prestissimo-runtime Dockerfile final stage that installs ca-certificates and tzdata packages via apt when apt-get is available. The block updates package lists, installs packages with minimal dependencies, and cleans up cached files.

Changes

Cohort / File(s) Summary
Dockerfile Runtime Dependencies
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
Adds conditional apt-based installation of ca-certificates and tzdata packages in the final build stage, with package list cleanup.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: restoring ca-certificates and tzdata packages in the worker runtime image, with a reference to the issue being fixed.
Description check ✅ Passed The PR description comprehensively addresses all required sections: detailed problem statement, root cause analysis, fix explanation, validation with specific test commands and results, and E2E testing confirmation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as ready for review March 18, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants