Skip to content

aws-sigv4 does not work with URLs containing "=" and other special characters #13754

@ashtuchkin

Description

@ashtuchkin

I did this

I'm trying to use --aws-sigv4 to download S3 objects containing "=" in their key. Example data/asset_id=my-asset/dt=2024-05-22/data.parquet - this is called hive partitioning scheme and is popular in data lakes.

Here's how I call it:

BUCKET="lasca-prod"
KEY="data/asset_id=my-asset/dt=2024-05-22/data.parquet"
curl --aws-sigv4 "aws:amz:us-west-2:s3" \
    --user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
    -H "x-amz-security-token: ${AWS_SESSION_TOKEN}" \
    "https://s3.us-west-2.amazonaws.com/$BUCKET/$KEY"

Curl 8.8 returns the following error (formatted for readability):

<?xml version="1.0" encoding="UTF-8"?>
<Error>
    <Code>SignatureDoesNotMatch</Code>
    <Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
    <AWSAccessKeyId>snip</AWSAccessKeyId>
    <StringToSign>AWS4-HMAC-SHA256
20240523T061547Z
20240523/us-west-2/s3/aws4_request
ce199afb9ea57628628a3d3254934e1ee480977872d06b3da40baa5ae4d2f9a7</StringToSign>
    <SignatureProvided>snip</SignatureProvided>
    <StringToSignBytes>snip</StringToSignBytes>
    <CanonicalRequest>GET
/lasca-prod/data/asset_id%3Dmy-asset/dt%3D2024-05-22/data.parquet

host:s3.us-west-2.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20240523T061547Z
x-amz-security-token:snip 

host;x-amz-content-sha256;x-amz-date;x-amz-security-token
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855</CanonicalRequest>
  ...
</Error>

Note that the path in CanonicalRequest is supposed to be quoted ("=" changed to "%3D"), whereas curl is not quoting it (confirmed by debug tracing). AWS docs say we need to apply their special URIEncode function to all the components of the canonical request.

I've created a basic patch to see if this is the actual problem and after applying it, the signature is accepted by AWS S3:

diff --git a/lib/http_aws_sigv4.c b/lib/http_aws_sigv4.c
index 98cc033..18567b4 100644
--- a/lib/http_aws_sigv4.c
+++ b/lib/http_aws_sigv4.c
@@ -423,6 +423,19 @@ static int compare_func(const void *a, const void *b)

 #define MAX_QUERYPAIRS 64

+static CURLcode canon_path(struct Curl_easy *data, const char *path, struct dynbuf *dq)
+{
+  CURLcode result = CURLE_OK;
+  for (const char *p = path; *p && !result; p++) {
+    if (*p == '=') {
+        result = Curl_dyn_addn(dq, "%3D", 3);
+    } else {
+        result = Curl_dyn_addn(dq, p, 1);
+    }
+  }
+  return result;
+}
+
 static CURLcode canon_query(struct Curl_easy *data,
                             const char *query, struct dynbuf *dq)
 {
@@ -540,6 +553,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
   struct dynbuf canonical_headers;
   struct dynbuf signed_headers;
   struct dynbuf canonical_query;
+  struct dynbuf canonical_path;
   char *date_header = NULL;
   Curl_HttpReq httpreq;
   const char *method = NULL;
@@ -569,6 +583,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
   /* we init those buffers here, so goto fail will free initialized dynbuf */
   Curl_dyn_init(&canonical_headers, CURL_MAX_HTTP_HEADER);
   Curl_dyn_init(&canonical_query, CURL_MAX_HTTP_HEADER);
+  Curl_dyn_init(&canonical_path, CURL_MAX_HTTP_HEADER);
   Curl_dyn_init(&signed_headers, CURL_MAX_HTTP_HEADER);

   /*
@@ -696,6 +711,10 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
   date[sizeof(date) - 1] = 0;

   result = canon_query(data, data->state.up.query, &canonical_query);
+  if(result)
+    goto fail;
+
+  result = canon_path(data, data->state.up.path, &canonical_path);
   if(result)
     goto fail;
   result = CURLE_OUT_OF_MEMORY;
@@ -708,7 +727,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
                   "%s\n" /* SignedHeaders */
                   "%.*s",  /* HashedRequestPayload in hex */
                   method,
-                  data->state.up.path,
+                  Curl_dyn_ptr(&canonical_path),
                   Curl_dyn_ptr(&canonical_query) ?
                   Curl_dyn_ptr(&canonical_query) : "",
                   Curl_dyn_ptr(&canonical_headers),

This seems similar to a known problem 16.1 aws-sigv4 does not sign requests with * correctly. I see that there was a PR to fix it, but it's now dead.

My intent with creating this issue is 1) to say that the scope of the bug is not limited to "*", but applies to any URL with any characters except alphanumeric and -, ., _ and ~, 2) hopefully resurrect the push to fix it.

I expected the following

The command above should work as-is.

curl/libcurl version

curl 8.8.0 (x86_64-pc-linux-gnu) libcurl/8.8.0 OpenSSL/3.3.0 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 c-ares/1.28.1 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0 nghttp2/1.62.1 nghttp3/1.3.0
Release-Date: 2024-05-22
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Largefile libz NTLM PSL SSL threadsafe TLS-SRP TrackMemory UnixSockets zstd

operating system

Linux 5.15.0-1058-aws #64~20.04.1-Ubuntu SMP Tue Apr 9 11:12:27 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions