-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Description
I did this
I'm trying to use --aws-sigv4 to download S3 objects containing "=" in their key. Example data/asset_id=my-asset/dt=2024-05-22/data.parquet - this is called hive partitioning scheme and is popular in data lakes.
Here's how I call it:
BUCKET="lasca-prod"
KEY="data/asset_id=my-asset/dt=2024-05-22/data.parquet"
curl --aws-sigv4 "aws:amz:us-west-2:s3" \
--user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
-H "x-amz-security-token: ${AWS_SESSION_TOKEN}" \
"https://s3.us-west-2.amazonaws.com/$BUCKET/$KEY"Curl 8.8 returns the following error (formatted for readability):
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>SignatureDoesNotMatch</Code>
<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
<AWSAccessKeyId>snip</AWSAccessKeyId>
<StringToSign>AWS4-HMAC-SHA256
20240523T061547Z
20240523/us-west-2/s3/aws4_request
ce199afb9ea57628628a3d3254934e1ee480977872d06b3da40baa5ae4d2f9a7</StringToSign>
<SignatureProvided>snip</SignatureProvided>
<StringToSignBytes>snip</StringToSignBytes>
<CanonicalRequest>GET
/lasca-prod/data/asset_id%3Dmy-asset/dt%3D2024-05-22/data.parquet
host:s3.us-west-2.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20240523T061547Z
x-amz-security-token:snip
host;x-amz-content-sha256;x-amz-date;x-amz-security-token
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855</CanonicalRequest>
...
</Error>Note that the path in CanonicalRequest is supposed to be quoted ("=" changed to "%3D"), whereas curl is not quoting it (confirmed by debug tracing). AWS docs say we need to apply their special URIEncode function to all the components of the canonical request.
I've created a basic patch to see if this is the actual problem and after applying it, the signature is accepted by AWS S3:
diff --git a/lib/http_aws_sigv4.c b/lib/http_aws_sigv4.c
index 98cc033..18567b4 100644
--- a/lib/http_aws_sigv4.c
+++ b/lib/http_aws_sigv4.c
@@ -423,6 +423,19 @@ static int compare_func(const void *a, const void *b)
#define MAX_QUERYPAIRS 64
+static CURLcode canon_path(struct Curl_easy *data, const char *path, struct dynbuf *dq)
+{
+ CURLcode result = CURLE_OK;
+ for (const char *p = path; *p && !result; p++) {
+ if (*p == '=') {
+ result = Curl_dyn_addn(dq, "%3D", 3);
+ } else {
+ result = Curl_dyn_addn(dq, p, 1);
+ }
+ }
+ return result;
+}
+
static CURLcode canon_query(struct Curl_easy *data,
const char *query, struct dynbuf *dq)
{
@@ -540,6 +553,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
struct dynbuf canonical_headers;
struct dynbuf signed_headers;
struct dynbuf canonical_query;
+ struct dynbuf canonical_path;
char *date_header = NULL;
Curl_HttpReq httpreq;
const char *method = NULL;
@@ -569,6 +583,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
/* we init those buffers here, so goto fail will free initialized dynbuf */
Curl_dyn_init(&canonical_headers, CURL_MAX_HTTP_HEADER);
Curl_dyn_init(&canonical_query, CURL_MAX_HTTP_HEADER);
+ Curl_dyn_init(&canonical_path, CURL_MAX_HTTP_HEADER);
Curl_dyn_init(&signed_headers, CURL_MAX_HTTP_HEADER);
/*
@@ -696,6 +711,10 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
date[sizeof(date) - 1] = 0;
result = canon_query(data, data->state.up.query, &canonical_query);
+ if(result)
+ goto fail;
+
+ result = canon_path(data, data->state.up.path, &canonical_path);
if(result)
goto fail;
result = CURLE_OUT_OF_MEMORY;
@@ -708,7 +727,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
"%s\n" /* SignedHeaders */
"%.*s", /* HashedRequestPayload in hex */
method,
- data->state.up.path,
+ Curl_dyn_ptr(&canonical_path),
Curl_dyn_ptr(&canonical_query) ?
Curl_dyn_ptr(&canonical_query) : "",
Curl_dyn_ptr(&canonical_headers),This seems similar to a known problem 16.1 aws-sigv4 does not sign requests with * correctly. I see that there was a PR to fix it, but it's now dead.
My intent with creating this issue is 1) to say that the scope of the bug is not limited to "*", but applies to any URL with any characters except alphanumeric and -, ., _ and ~, 2) hopefully resurrect the push to fix it.
I expected the following
The command above should work as-is.
curl/libcurl version
curl 8.8.0 (x86_64-pc-linux-gnu) libcurl/8.8.0 OpenSSL/3.3.0 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 c-ares/1.28.1 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0 nghttp2/1.62.1 nghttp3/1.3.0
Release-Date: 2024-05-22
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Largefile libz NTLM PSL SSL threadsafe TLS-SRP TrackMemory UnixSockets zstd
operating system
Linux 5.15.0-1058-aws #64~20.04.1-Ubuntu SMP Tue Apr 9 11:12:27 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux