HTTP support for S3, GCS, and Azure.#2703
Conversation
Add support for HTTP endpoints for S3, Azure, and GCS. Allow users to specify HTTP in the endpoint, but default to using HTTPS in all other scenarios to preserve existing behavior. Extend the URL parser with a defaultType parameter to support either: - explicitly specifying a protocol via .type and enforcing that protocol is used in the URL - allowing protocol to be parsed from URL, but providing default via .defaultType if no protocol is found in the URL Add partial write handling in fdWrite to support non-blocking socket operations. The write loop now handles EAGAIN errors by waiting for the file descriptor to become writable, and continues writing the remaining bytes when write() returns fewer bytes than requested. This is required for HTTP, which may use non-blocking sockets, but doesn't have built in handling like the TLS client we are using for HTTPS. Also wrap the write() call, add a shim, and additional logging for easier unit testing.
|
I must say this is a very well thought out and complete patch. Kudos! I see a few minor things I'll want changed but overall I don't see any barriers to getting this committed. Should be no problem for get this in for the January release. |
|
Thank you! That's great, looking forward to having this included. Please let me know if you want me to address any of those minor issues, didn't see any comments on the code yet, so wasn't sure if you were planning on just fixing yourself before merging. |
|
I'll post a review soon once I have digested the patch a bit. |
|
Curious! Was this ever addressed in #1590 ? I'm here because we can't get http on a self-hosted minio server to play ball with pgbackrest. The link to the card in the old issue gives me a 404 🤷🏼♂️ |
|
@4n3w It wasn't addressed. That's what the current patch is about. There was also an earlier attempt that was abandoned by the author.
At some point GitHub forced a migration to their new project system, which broke all the links. |
dwsteele
left a comment
There was a problem hiding this comment.
This looks pretty good overall but I think we need a few changes.
See comment I left about moving the helper function.
I also think that the S3/GCS/Azure tests should at least do a few operations with the test server using HTTP. I think your tests show the configuration is correct, but would still be better I think to have some actual tests.
|
Thanks for the review! Makes sense on testing some actual operations with HTTP, added a few basic cases (read, write, error), and happy to add some of the more complex cases too if you want. Also made the helper function static as suggested. |
- Rename fdWrite to ioFdWriteInternal for module naming consistency (lots of cascades from this) - Minor formatting and code updates - No need to start a new test -- combine into prior test
- Update comment
- Minor code updates - Remove redundant tests (endpoint parsing is well covered) - Move HTTP tests from HTTPS rather than duplicating tests
- Minor code updates - Remove redundant tests (endpoint parsing is well covered) - Move HTTP tests from HTTPS rather than duplicating tests
- Minor code updates - Remove redundant tests (endpoint parsing is well covered) - Move HTTP tests from HTTPS rather than duplicating tests [also reordered some tests in S3 and Azure to the original order of where they were copied from]
dwsteele
left a comment
There was a problem hiding this comment.
This looks good to me now. See commits for my changes and reasons.
I know it looks like a lot but mostly it is renaming/reorganizing. The S3/GCS/Azure tests have a fair amount of churn so it is important to keep them as coherent as possible.
|
Sounds good, I see how that makes the S3/GCS/Azure tests much cleaner, thanks for making those changes |
|
Committed. Thank you! |
|
Awesome, thanks for including this! |
Add support for HTTP endpoints for S3, Azure, and GCS. Allow users to specify HTTP in the endpoint, but default to using HTTPS in all other scenarios to preserve existing behavior.
Extend the URL parser with a
defaultTypeparameter to support either:.typeand enforcing that protocol is used in the URL.defaultTypeif no protocol is found in the URLAdd partial write handling in fdWrite to support non-blocking socket operations. The write loop now handles EAGAIN errors by waiting for the file descriptor to become writable, and continues writing the remaining bytes when write() returns fewer bytes than requested. This is required for HTTP, which may use non-blocking sockets, but doesn't have built in handling like the TLS client we are using for HTTPS. Also wrap the write() call, add a shim, and additional logging for easier unit testing.
The main motivation for this is to allow pgBackRest to work natively with Ceph HTTP frontends. This was previously brought up in #2340 for a Ceph offshoot called MicroCeph, where it looks like they instead fixed the issue for that specific product on the Ceph side by adding HTTPS support in this PR. However, I think it would still be beneficial to have HTTP support so the problem is solved from both ways (and this would help my company's use case for backing up to our own private Ceph cluster that doesn't have HTTPS enabled). Also saw that there was some work done on this in #2587, but wasn't sure if that was actively being worked on, so wanted to propose another implementation + add support for Azure/Google Cloud.
Looks like everything is passing when I run the CI checks on my end (unit tests, code coverage, code-format) and did some manual testing with backing up one of my dev databases, seemed to be working well.