fix(presto-clp): Fix Presto S3 queries returning zero rows by updating worker image and correcting S3 endpoint URL format (fixes #2107, fixes #2108).#2109
Conversation
…g worker image and correcting S3 endpoint URL format (fixes y-scope#2107, fixes y-scope#2108).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughPresto worker image default tag updated in the CLP docker-compose file. S3 endpoint resolution logic in the CLP init script was changed to remove bucket-based hostname derivation and the helper signature was adjusted accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… resolution function
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/deployment/presto-clp/docker-compose.yaml`:
- Line 27: The image line exceeds the 100-column lint limit; update the image
declaration for prestissimo-worker by wrapping the long string so it doesn't
exceed 100 columns (modify the line containing image:
"ghcr.io/y-scope/presto/prestissimo-worker:${CLP_PRESTO_WORKER_IMAGE_TAG:-clp-v0.10.0-fix.1}").
Use a YAML-safe multiline approach (e.g., a folded block scalar or split the
value across lines) or shorten the default tag via the
CLP_PRESTO_WORKER_IMAGE_TAG default to keep the final rendered string within 100
columns while preserving the same image and variable substitution behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: af500ec1-6c39-4b88-80d3-c1916752b98b
📒 Files selected for processing (2)
tools/deployment/presto-clp/docker-compose.yamltools/deployment/presto-clp/scripts/init.py
…g worker image and correcting S3 endpoint URL format (fixes y-scope#2107, fixes y-scope#2108). (y-scope#2109)
Description
Presto queries against S3-backed CLP archives silently return zero rows due to two issues:
Missing CA certificates in the worker image (#2107): The Prestissimo worker
image
clp-v0.10.0was built withoutca-certificatesandtzdataafter an upstream merge(y-scope/presto#147) accidentally dropped the
packages originally added in
y-scope/presto#55. This causes silent TLS
verification failures against presigned S3 URLs. The Dockerfile fix has been merged upstream
(y-scope/presto#154) and a corrected image
published as
clp-v0.10.0-fix.1.Virtual-hosted style S3 endpoint URL:
_resolve_s3_endpoint_url()ininit.pyconstructsURLs like
https://<bucket>.s3.<region>.amazonaws.com, but the Prestissimo worker expectspath-style URLs (
https://s3.<region>.amazonaws.com). Since the bucket name is passedseparately via
clp.s3-bucket, the worker ends up doubling the bucket name in the requestpath, causing silent 404s. This bug has been present since S3 support was first added in
de378af0and carried through the refactor in feat(presto-clp): Add support for custom S3 endpoint URLs and buckets. #1917.This PR:
Updates the default Prestissimo worker image tag from
clp-v0.10.0toclp-v0.10.0-fix.1indocker-compose.yaml:https://github.com/y-scope/clp/blob/b27425ab35b1fdf4e39f80e2fca0f52d64ffa09b/tools/deployment/presto-clp/docker-compose.yaml#L27
Fixes
_resolve_s3_endpoint_url()ininit.pyto produce path-style S3 endpoint URLs,consistent with the Helm chart's
configmap.yaml:https://github.com/y-scope/clp/blob/f9acdbaacdd9afca5a460e4e1e8ba1e5f3d16cc9/tools/deployment/presto-clp/scripts/init.py#L314-L316
Checklist
breaking change.
Validation performed
Scenario 1: Verify CA certificates in new vs old worker image
Task: Confirm that the
clp-v0.10.0image is missing CA certificates and thatclp-v0.10.0-fix.1includes them.Command:
Output:
Explanation: The old image is missing
/etc/ssl/certs/ca-certificates.crt, which causes thePrestissimo worker's bundled
libcurlto silently fail TLS verification when accessing S3 presignedURLs. The new image includes the certificate bundle.
Scenario 2: Verify S3 endpoint URL fix
Task: Confirm that
init.pynow generates path-style S3 endpoint URLs instead ofvirtual-hosted style.
Command:
cd tools/deployment/presto-clp scripts/set-up-config.sh /path/to/clp-package grep S3_END_POINT .envOutput:
Explanation: The endpoint is now path-style (
https://s3.<region>.amazonaws.com) instead ofvirtual-hosted style (
https://<bucket>.s3.<region>.amazonaws.com). This matches the Helm chart'sconfigmap.yamland prevents the Prestissimo worker from doubling the bucket name in S3 requests.Scenario 3: End-to-end query via Presto CLI
Task: Verify that Presto queries return data when archives are stored on S3.
Command:
Output:
Then start Presto and query:
Command:
Output:
Explanation: The query returns 2,000,000 rows from the compressed PostgreSQL sample dataset,
confirming that the worker can successfully read archives from S3 with both the CA certificate fix
and the path-style endpoint fix applied.
Scenario 4: End-to-end query via CLP Web UI
Task: Verify that the CLP Web UI can execute Presto SQL queries and display results.
Steps:
http://localhost:4000in a browser.SELECT * FROM default) and click Run.The Web UI shows "Success - search job ... found 2000000 results" with a timestamp histogram
and a data table displaying the PostgreSQL log fields (
backend_type,error_severity,ps,query_id, etc.), confirming the full pipeline works end-to-end.Scenario 5: Custom S3 endpoint (MinIO) still works
Task: Verify that the endpoint URL fix does not break the custom
endpoint_urlpath used forS3-compatible storage like MinIO.
Setup:
docker run --rm --network host --entrypoint sh minio/mc -c \ "mc alias set local http://localhost:9000 minioadmin minioadmin && mc mb local/test-logs"clp-config.yamlwith MinIO as the archive storage:Command:
grep "S3_END_POINT\|S3_BUCKET" tools/deployment/presto-clp/.envOutput:
Explanation: When a custom
endpoint_urlis configured,_resolve_s3_endpoint_url()returnsit directly (the early-return on line 312), so the fix to the fallback path does not affect this
code path.
Command:
Output:
Command:
Output:
Explanation: Queries against MinIO-backed archives return 1,000,000 rows with correct data,
confirming that the custom endpoint path is unaffected by the fix.
Summary by CodeRabbit
Bug Fixes
Chores