Request
One of our users has requested the ability to ingest logs from S3-compatible storage services. However, the current implementation of the CLP package assumes that all S3 interactions are directed to AWS's default S3 endpoint (amazonaws.com).
This limitation restricts integration with other S3-compatible storage providers, which may use custom or non-AWS endpoints. It would be nice to add support for ingesting from custom S3 endpoints.
Possible implementation
Currently, when the user calls compress-from-s3.sh, the following steps occur:
-
The URL supplied by the user is parsed and validated. If the endpoint isn't amazonaws.com, an error is raised.
-
The URL is converted into an S3InputConfig object, which stores the region, bucket, and key of the input.
-
S3InputConfig is used to fetch all keys for a key prefix, if the user requested a key-prefix ingestion.
-
In the compression worker, S3InputConfig is used to generate a list of virtual-hosted style URLs, which is then passed to clp-s.
-
clp-s deserializes the virtual-hosted URLs using the S3Url class.
-
clp-s signs the url based on the fields in the S3Url class.
-
S3Url and the signature is serialized into another virtual-hosted URL.
Required changes
To support custom S3-compatible endpoints, we propose the following changes:
-
Add a new endpoint_url field to both the S3InputConfig and S3Url classes.
-
Most S3-compatible storage services is only accessible using path-style URLs. Therefore, clp-s would have to send the request using path-style URLs. As a solution, when the endpoint isn't amazonaws.com, both clp-s and the compression worker switch to path-style URLs.
These changes will enable the integration of custom S3-compatible endpoints while maintaining compatibility with AWS's default S3 endpoint.
Path style urls
Request
One of our users has requested the ability to ingest logs from S3-compatible storage services. However, the current implementation of the CLP package assumes that all S3 interactions are directed to AWS's default S3 endpoint (amazonaws.com).
This limitation restricts integration with other S3-compatible storage providers, which may use custom or non-AWS endpoints. It would be nice to add support for ingesting from custom S3 endpoints.
Possible implementation
Currently, when the user calls
compress-from-s3.sh, the following steps occur:The URL supplied by the user is parsed and validated. If the endpoint isn't amazonaws.com, an error is raised.
The URL is converted into an
S3InputConfigobject, which stores the region, bucket, and key of the input.S3InputConfigis used to fetch all keys for a key prefix, if the user requested a key-prefix ingestion.In the compression worker,
S3InputConfigis used to generate a list of virtual-hosted style URLs, which is then passed to clp-s.clp-s deserializes the virtual-hosted URLs using the
S3Urlclass.clp-s signs the url based on the fields in the S3Url class.
S3Urland the signature is serialized into another virtual-hosted URL.Required changes
To support custom S3-compatible endpoints, we propose the following changes:
Add a new endpoint_url field to both the
S3InputConfigandS3Urlclasses.Most S3-compatible storage services is only accessible using path-style URLs. Therefore, clp-s would have to send the request using path-style URLs. As a solution, when the
endpointisn'tamazonaws.com, both clp-s and the compression worker switch to path-style URLs.These changes will enable the integration of custom S3-compatible endpoints while maintaining compatibility with AWS's default S3 endpoint.
Path style urls