Problem
The Presto Docker Compose project and the CLP Package Docker Compose project use different conventions for file mounting, which causes Presto queries to return zero rows.
Current Behavior
Root Cause
With the clp-s storage engine, when archives are compressed through the Package project, the metadata DB's clp-datasets table stores the archive_storage_directory field using container paths (e.g., /var/data/archives) instead of the corresponding host paths (e.g., /path/to/var/data/archives). This mismatch prevents the Presto coordinator from locating the archives, as they are not mounted under /var/data/archives in the Presto container.
Impact
Presto queries return zero rows because the coordinator cannot locate the archives due to path mismatches.
Historical Context
Before PR #1178, when the components were orchestrated via Python subprocess calls to the docker CLI, they followed the same mapping convention as the current Presto project.
References
Problem
The Presto Docker Compose project and the CLP Package Docker Compose project use different conventions for file mounting, which causes Presto queries to return zero rows.
Current Behavior
/path/to/var/data/archivesdirectory (or/path/to/var/data/staged-archiveswhen using S3 output) into the container at the same absolute path as on the host./path/to/var/data/archivesdirectory to/var/data/archivesin the container, and/path/to/var/data/staged-archivesto/var/data/staged-archives.Root Cause
With the
clp-sstorage engine, when archives are compressed through the Package project, the metadata DB'sclp-datasetstable stores thearchive_storage_directoryfield using container paths (e.g.,/var/data/archives) instead of the corresponding host paths (e.g.,/path/to/var/data/archives). This mismatch prevents the Presto coordinator from locating the archives, as they are not mounted under/var/data/archivesin the Presto container.Impact
Presto queries return zero rows because the coordinator cannot locate the archives due to path mismatches.
Historical Context
Before PR #1178, when the components were orchestrated via Python subprocess calls to the
dockerCLI, they followed the same mapping convention as the current Presto project.References