aws_sigv4: URI encoding by outscale-mgo · Pull Request #7600 · curl/curl

outscale-mgo · 2021-08-20T13:22:56Z

It seems AWS signature V4 was lacking part of the algorith, AWSv4 should in theory URL Encode, the data before using it in the algorytme.

problem is that depending of whatever V4 signature is use for s3 or another API, the URL encoding is not the same.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html

In one case the data is single encoded, and in the other case, it is double encode.

So the patch add a new option to curl, that enable the user to choose the number of encoding you want. (default is 0)

Here is the changes of this PR:

add --aws-sigv4-nb-encoding option, default to 0
add some static function in https_aws_sigv4 file .
uri_encore data->state.up.query and data->state.up.path , if the option is set.

some notes:

I didn't test, nor implement double encoding yet.
I couldn't use strlen_url in urlapi.c as the algorytm use for the encoding is slightly different
I'm not sure hostname trim is useful, maybe curl already do it when storing it in hostname ?
AWS documentation say nothing explicitly about = encoding, but say that each value and key of the query need to be encoded, while the = use to slipt the key from the value need to be left as it. as data->state.up.query contain all queries parameters as a single string, I need to determine which '=' are part of a value and which one a "true" equal

should fix: #7559 #9717

drTr0jan · 2021-09-02T07:48:00Z

Hi @outscale-mgo,

thx for awesome patch! Good job!

But there're some questions about it. How do you think about PR to you rep?

About hostname... It's not need to trim the hostname. Curl uses trimmed hostname variable. AWS recommends to trim hostname while using "Host:" header, but Curl generate 'Host:' header after using hostname variable.

outscale-mgo · 2021-09-02T11:55:49Z

Hi @outscale-mgo,

thx for awesome patch! Good job!

Hi @drTr0jan
Thanks for the feedback !

But there're some questions about it. How do you think about PR to you rep?

As I want to upstream my change I would prefer a review here over a PR on my rep, though I wouldn't say no to a PR that improve some of my patchs.

Also I will archive my repo once my modifications are upstream (and unarchive it if I need it again).

About hostname... It's not need to trim the hostname. Curl uses trimmed hostname variable. AWS recommends to trim hostname while using "Host:" header, but Curl generate 'Host:' header after using hostname variable.

Ok, I've just force push a new version that don't trim, thanks for the feedback again !

drTr0jan · 2021-09-02T12:43:37Z

lib/http_aws_sigv4.c

It's not need to parse a space character separately, Default case parses a space correctly.
Why do you parse it?

drTr0jan · 2021-09-02T12:43:56Z

lib/http_aws_sigv4.c

Why do you use the clause?

drTr0jan · 2021-09-02T12:45:34Z

tests/data/test1937

It's useful to verify multibyte characters also. For example: %TESTNUMBER/testapi/test+=+тест+++测试+
What do you think about it?

drTr0jan · 2021-09-03T11:18:36Z

tests/data/test1937

It's useful to verify 2-byte (not only 3-byte UTF-8 symbols) characters also. For example: GET /%TESTNUMBER/testapi/test===カールロックスкурл рокс!!!=== HTTP/1.1
What do you think about it?

sure why not,
Also I've try to write "curl rocks" in katakana (japanese), which seems to give "kaaru rokusu". but can also be translate as karl rocks also. and I guess "КарлРокс" mean karl rocks in cyrilics isn't it ?
Also google translate seems tu tell me that "керл Рокс" would be a better translation, is that right ?

Let it be "курл рокс" )

drTr0jan · 2021-09-03T11:22:00Z

lib/http_aws_sigv4.c

strlen_uri() returns the length of the given URI this function is a modified version of strlen_url in urlapi.c

drTr0jan · 2021-09-03T11:23:16Z

lib/http_aws_sigv4.c

strcpy_uri() copies a url to an output buffer and URL-encodes the URI accordingly this function is a modified version of strcpy_url in urlapi.c

It's "a URL". The leading U in URL is pronounced as "you" with a starting consonant sound, and thus "a URL" is correct.

@bagder, okay, thanks.

drTr0jan · 2021-09-03T11:25:04Z

lib/http_aws_sigv4.c

What do you think about renaming vars to: canonical_uri and canonical_query_str ?

bagder · 2021-09-06T11:57:23Z

lib/http_aws_sigv4.c

It is not possible to read these functions now without thinking: why does this file need to almost duplicate these functions?

How do they differ and why isn't tweaking the ones in urlapi.c better?

@outscale-mgo, I think, good idea is taking if(in_query) and default clauses out from case-statement.
case '&' clause should be executing always only in if(in_query) for "without thinking".

something like:

for(; *iptr; /* until zero byte */ iptr++) { if(in_query) { if(*iptr == '=') { if(!count_equal++) { msnprintf(optr, 4, "%%%02X", *iptr); optr += 3; } else *optr++=*iptr; continue; } else if (*iptr == '&') count_equal = 0; } switch(*iptr) { case '%': *optr++ = *iptr; if(iptr[1]) { iptr++; *optr++ = (char)TOUPPER(*iptr); if(*iptr != '%' && iptr[1]) { iptr++; *optr++ = (char)TOUPPER(*iptr); } } break; default: if(urlchar_needs_escaping(*iptr, in_query)) { msnprintf(optr, 4, "%%%02X", *iptr); optr += 3; } else *optr++=*iptr; break; } }

?

bagder · 2021-09-06T12:01:50Z

lib/http_aws_sigv4.c

I don't understand why it is curl's job to encode the equals signs like this. The user has provided a URL to use. Why would the user provide a wrong one for curl to re-encode like this?

https://docs.aws.amazon.com/AmazonS3/latest/API/images/sigV4-using-query-params.png
In this document AWS say that you need to encode every parameter and arguments of the query string. problem is that curl seems to contain the whole query string as a single, so I had to find a way to determine which part need to be encode, and which part of need to be left as it.

@bagder, AWS doesn't recommend using standard UriEncode functions provided by development platform may not work because of differences in implementation and related ambiguity in the underlying RFCs.

2 small details:
the equal is for example, if an user call something like https://api.region.provider/Call&Argument====&Argument2=2= in which case, if as I've understand correctly, all equal but the first after Argument and Argument2 must be encode. (but this is a corner case, which I'm not sure is even possible)

Also I could patch the functions in url API, but it would add a lot of if/else which might be ugly (for examples curl use lovercase for hexa and AWS require upper case), also as @drTr0jan said, AWS advise to reimplement the encoding.

That's borderline crazy since a URL is already URL encoded as otherwise it's not a URL. Does the spec also clearly say what bytes it needs to re-encode?

BTW, doesn't that also imply that you should first URL decode the query string so that you get it truly canonicalized? Otherwise existing URL-encoded characters like %41 might thwart this.

Now I think about it, I do think the AWS doc is about not yet encoded URL, (so not URL).
also the issue is about using "https://${ES_DOMAIN_ENDPOIN}/tempo_data-*/event/_mapping" instead of "https://${ES_DOMAIN_ENDPOIN}/tempo_data-%2A/event/_mapping".

though it might still be useful to add an option to force encoding ? (in which case we won't have to deal with already encoded URL)

I think the issue here is that they want to make the url canonical, and to do that you need to first URL decode and then URL encode because curl always operate on a URL as input and that means it is encoded. So a * could be passed in as a * or as %2a or as %2A. They're all fine according to the URL syntax. But they will then genrate different hashes and that's not what a user wants.

bagder · 2021-09-09T14:22:26Z

lib/http_aws_sigv4.c

This function doesn't "copy a URI" it (re-)encodes a query string according to the aws sig v4 rules.

How about instead call it encode_query() ?

bagder · 2021-09-09T14:23:36Z

lib/http_aws_sigv4.c

This function returns the new length of a query string, it's not really a "uri". strlen_encoded_querry() maybe?

bagder · 2021-09-09T14:25:55Z

lib/http_aws_sigv4.c

BTW, doesn't that also imply that you should first URL decode the query string so that you get it truly canonicalized? Otherwise existing URL-encoded characters like %41 might thwart this.

drTr0jan · 2021-09-20T13:49:29Z

lib/http_aws_sigv4.c

Why do you compare *iptr with '%'?

so if you have an encoded URL, containing '%2a' for example I can use '%2A' in the canonical header. I detect '%' so I can decode -> encode URL

What about for example %41 present in the original URL. Shouldn't that rather get decoded to A in a canonical URL?

@outscale-mgo, but you've detected '%' while case '%' yet, haven't you?

What about for example %41 present in the original URL. Shouldn't that rather get decoded to A in a canonical URL?

ok I'll do that

@outscale-mgo, but you've detected '%' while case '%' yet, haven't you?

yes, because %% is deferent from %01 which represent an ASCII character, anyway I'm going to change the algo to make a full decode -> encode.

drTr0jan · 2021-10-01T16:18:41Z

lib/curl_setup_once.h

Why do need this code?

good question, I've take inspiration from what is done in urlapi.c, where TOLOWER is use there.
TBH I don't know what's the advantage over using toupper, I've just assume it was there for a reason, maybe it was wrong, @bagder ?

You want to use Curl_raw_toupper() and not toupper() to avoid it being affected by locales.

drTr0jan · 2021-10-04T12:40:51Z

lib/http_aws_sigv4.c

For better understanding it can be like as:

for(ptr = (unsigned char *)url; *ptr; ptr++) { if (in_query) { switch (*ptr) { case '=': if(count_equal++) newlen += 2; newlen++; continue; case '&': count_equal = 0; } } if(urlchar_needs_escaping(*ptr, in_query)) newlen += 2; newlen++; }

in this case, if you go into the case '=': it will bug, because, newlen will be increment twice or 4 times.
so the code should be:

for(ptr = (unsigned char *)url; *ptr; ptr++) { if (in_query) { switch (*ptr) { case '=': if(count_equal++) newlen += 2; newlen++; continue; // replace break by continue here. case '&': count_equal = 0; } } if(urlchar_needs_escaping(*ptr, in_query)) newlen += 2; newlen++; }

TBH I don't think it simplify the code a lot, so I don't think I will change that.

Of course, continue should be there. Sorry for my typo.

sonatype-lift · 2021-11-05T11:01:07Z

lib/http_aws_sigv4.c

+
+      ascii_code = strtol(num_str, NULL, 16);
+      if(ascii_code) {
+        *optr++ = (char)ascii_code;


CWE 197 advisory: potential loss of information in cast
(at-me in a reply with help or ignore)

hello an idea for that ?
would an

if (ascii_code > 255) handle_errror()

be enough ?

char is often signed, so that typecast is not limiting the value to 0 - 255. And since you want an unsigned value, maybe strtoul() is actually what you want? The fact that char is also not certain to be signed or unsigned is also a reason to use it unsigned explicitly.

The code in escape.c does this:

hex = strtoul(hexstr, &ptr, 16); in = curlx_ultouc(hex); /* this long is never bigger than 255 anyway */

lgtm-com · 2021-11-08T17:15:38Z

This pull request introduces 1 alert when merging 13b5c43557ea6404b10c82260afd43c436d92b3e into 351b181 - view on LGTM.com

new alerts:

1 for Implicit function declaration

outscale-mgo · 2022-01-31T18:30:27Z

Ok, I'll try to do that.

So I've try to use only urlapi to manipulate the query string, it seems urlapi doesn't decode/encode the query yet. (but doesn't seems too hard to change either)
So if I had to put all url handling code in urlapi.c/h here's how I would do it: add 3 flags to urlapi set/get
#define CURLU_ENCODE_UPPERCASE (1<<12)  /* Encode URL using uppercase hex */
#define CURLU_ENCODE_DECODE_QUERY (1<<13)        /* enable the encoding/decoding of the query string */
#define CURLU_SORT_QUERY (1<<14)        /* Sort Query on set */
modify this line
#define CURLU_URLDECODE (1<<6)          /* URL decode on get */
to
#define CURLU_URLDECODE (1<<6)          /* URL decode on get, or on set on query if CURLU_ENCODE_DECODE_QUERY is set */
and then patch get/set methode so it handle sort, and encode/decode of query.
then in sigv4 code, In would use curl_url_set with CURLU_ENCODE_DECODE_QUERY | CURLU_SORT_QUERY | CURLU_URLDECODE to decode the query, so if I have some lowercased encoded url, I will decode it, and sort the query at the same time.
and then use curl_url_get with CURLU_ENCODE_DECODE_QUERY | CURLU_URLENCODE in order to decode query in a format that will always suit the signature process.
Is this okay for you ?

My bad it seems urlapi does handle query.

bagder · 2022-01-31T22:27:36Z

#define CURLU_ENCODE_UPPERCASE (1<<12) /* Encode URL using uppercase hex */

This one I think can probably be fine to provide publicly.

#define CURLU_ENCODE_DECODE_QUERY (1<<13) /* enable the encoding/decoding of the query string */

Why not use the decode flag when asking for or setting the query?

#define CURLU_SORT_QUERY (1<<14) /* Sort Query on set */

This doesn't feel like material for the public API. It's just too weird and specific.

#define CURLU_URLDECODE (1<<6) /* URL decode on get, or on set on query if CURLU_ENCODE_DECODE_QUERY is set */

I don't understand this change.

outscale-mgo · 2022-03-08T14:30:27Z

Hello, some update on this PR:

I did try to use urlapi but fail to do so (my test is here: https://github.com/outscale-dev/curl/commits/http_aws_sigv4_encoding_urlapi)

The major problem I encounter is that aws url encoding, encode characters as '/', '=' and some other reserved characters.
So in order to use urlapi I would have to recode partially strlen/cpy_url only for aws-sigv4 code, so I guess it's still better to keep the code in http_aws_sigv4.c.

Also, it seems that depending of the API use (see here: https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html and here: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html) the number of encoding that must be done is not the same. (double encoding on equal on some parameter and 1 on some other)

So I've add an option to specify the number of encoding that should be done, and with 0 encoding, none is done, but can still be done manually, which I'd like to keep, so if there is a bug in the encoding, curl user will still be able to do it themself.

Note that presently, only 0 and 1 are supported,.

I still need to write test, and improve the man, but as I didn't give feedback since some time, I've push my current work on this branch.

Edit: the docs and tests should be done now

Following that: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html note that the documentation say nothing explicitely about '=' encoding, but say that each value and key of the query need to be encoded while the '=' use to slipt the key from the value need to be left as it. as data->state.up.query contain all queries parameters as a single string, I need to determine which '=' are part of a value and which one a "true" equal should fix: curl#7559 Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

add new option aws-sigv4-nb-encoding which is here to tell curl to encode url in aws v4 signature Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

bagder · 2022-08-29T15:40:20Z

Is this still work in progress? It has merge conflicts and I would like to review the code. Please also update the description of this PR so that it is very clear what the purpose of it is.

outscale-mgo · 2022-08-30T10:51:09Z

Is this still work in progress? It has merge conflicts and I would like to review the code. Please also update the description of this PR so that it is very clear what the purpose of it is.

hello, I've try to update the PR description, it it's purpose is more clear.
It's still work in progress but, "on hold", as it's very easy to work around the encoding problem by encoding the data manually.
I'll came back on this PR, once the header refactoring is merge.

I can rebase it in the meantime, if you want, but I would prefer to take more time, so I can add the double encoding support.

bagder · 2022-08-30T10:55:11Z

We can wait, sure!

bagder · 2023-07-27T21:46:15Z

This PR just died. Feel free to submit a new/reopen if work resumes!

jay added authentication HTTP labels Aug 21, 2021

outscale-mgo force-pushed the http_aws_sigv4_encoding branch from 522ba8b to 176d74f Compare August 23, 2021 13:18

outscale-mgo force-pushed the http_aws_sigv4_encoding branch from 176d74f to 9318d88 Compare September 2, 2021 11:52

drTr0jan reviewed Sep 2, 2021

View reviewed changes

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 2 times, most recently from da99393 to 1e812fa Compare September 2, 2021 14:29

drTr0jan reviewed Sep 3, 2021

View reviewed changes

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 2 times, most recently from f1b8c64 to cbd8841 Compare September 6, 2021 09:52

bagder reviewed Sep 6, 2021

View reviewed changes

bagder reviewed Sep 9, 2021

View reviewed changes

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 2 times, most recently from 9d34c53 to 28c0865 Compare September 14, 2021 14:33

drTr0jan reviewed Sep 20, 2021

View reviewed changes

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 2 times, most recently from f511c01 to 7025344 Compare September 27, 2021 15:17

drTr0jan reviewed Oct 1, 2021

View reviewed changes

drTr0jan reviewed Oct 4, 2021

View reviewed changes

outscale-mgo force-pushed the http_aws_sigv4_encoding branch from 7025344 to 0f55c65 Compare November 5, 2021 10:41

sonatype-lift bot reviewed Nov 5, 2021

View reviewed changes

outscale-mgo force-pushed the http_aws_sigv4_encoding branch from 0f55c65 to 13b5c43 Compare November 8, 2021 17:02

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 2 times, most recently from 08180ac to 8591bdf Compare November 16, 2021 15:14

outscale-mgo reopened this Jan 31, 2022

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 2 times, most recently from ee2e49b to fe3edad Compare January 31, 2022 15:18

outscale-mgo force-pushed the http_aws_sigv4_encoding branch 5 times, most recently from f948538 to 55be259 Compare March 11, 2022 14:46

outscale-mgo force-pushed the http_aws_sigv4_encoding branch from 55be259 to f630d83 Compare March 21, 2022 09:31

outscale-mgo added 7 commits March 21, 2022 10:34

http_aws_sigv4: decode url and rework encode

16b2ce9

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

http_aws_sigv4: add test for uri encoding

8ad9b7d

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

http_aws_sigv4: correctly compute empty param & sort for query

c66cf99

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

http_aws_sigv4: test query parameters

4d5adf3

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

http_aws_sigv4: new option aws-sigv4-nb-encoding

b0ece9d

add new option aws-sigv4-nb-encoding which is here to tell curl to encode url in aws v4 signature Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

http_aws_sigv4: add test for nb encoding at 0

0933688

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>

outscale-mgo force-pushed the http_aws_sigv4_encoding branch from f630d83 to 0933688 Compare March 21, 2022 09:35

bagder changed the title ~~aws signature V4 URI encoding~~ aws_sigv4: URI encoding Aug 29, 2022

jay added the on-hold label Oct 3, 2022

outscale-mgo mentioned this pull request Oct 17, 2022

--aws-sigv4 doesn't sort query string parameters before signing #9717

Closed

outscale-mgo mentioned this pull request Oct 27, 2022

aws_sigv4: Include and sign X-Amz-Content-Sha256. #8935

Closed

bagder closed this Jul 27, 2023

ashtuchkin mentioned this pull request May 23, 2024

aws-sigv4 does not work with URLs containing "=" and other special characters #13754

Closed

Uh oh!

Conversation

outscale-mgo commented Aug 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drTr0jan commented Sep 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

outscale-mgo commented Sep 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drTr0jan Sep 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

outscale-mgo Sep 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drTr0jan Sep 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

outscale-mgo Sep 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

outscale-mgo Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bagder Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

outscale-mgo Sep 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

outscale-mgo commented Aug 20, 2021 •

edited

Loading

drTr0jan commented Sep 2, 2021 •

edited

Loading

outscale-mgo commented Sep 2, 2021 •

edited

Loading

drTr0jan Sep 3, 2021 •

edited

Loading

outscale-mgo Sep 3, 2021 •

edited

Loading

drTr0jan Sep 3, 2021 •

edited

Loading

outscale-mgo Sep 20, 2021 •

edited

Loading

outscale-mgo Sep 9, 2021 •

edited

Loading

bagder Sep 10, 2021 •

edited

Loading

outscale-mgo Sep 21, 2021 •

edited

Loading

drTr0jan Oct 4, 2021 •

edited

Loading

outscale-mgo Oct 14, 2021 •

edited

Loading