Conversation
522ba8b to
176d74f
Compare
|
Hi @outscale-mgo, thx for awesome patch! Good job! But there're some questions about it. How do you think about PR to you rep? About hostname... It's not need to trim the hostname. Curl uses trimmed hostname variable. AWS recommends to trim hostname while using "Host:" header, but Curl generate 'Host:' header after using hostname variable. |
176d74f to
9318d88
Compare
Hi @drTr0jan
As I want to upstream my change I would prefer a review here over a PR on my rep, though I wouldn't say no to a PR that improve some of my patchs. Also I will archive my repo once my modifications are upstream (and unarchive it if I need it again).
Ok, I've just force push a new version that don't trim, thanks for the feedback again ! |
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
It's not need to parse a space character separately, Default case parses a space correctly.
Why do you parse it?
lib/http_aws_sigv4.c
Outdated
tests/data/test1937
Outdated
There was a problem hiding this comment.
It's useful to verify multibyte characters also. For example: %TESTNUMBER/testapi/test+=+тест+++测试+
What do you think about it?
da99393 to
1e812fa
Compare
tests/data/test1937
Outdated
There was a problem hiding this comment.
It's useful to verify 2-byte (not only 3-byte UTF-8 symbols) characters also. For example: GET /%TESTNUMBER/testapi/test===カールロックスкурл рокс!!!=== HTTP/1.1
What do you think about it?
There was a problem hiding this comment.
sure why not,
Also I've try to write "curl rocks" in katakana (japanese), which seems to give "kaaru rokusu". but can also be translate as karl rocks also. and I guess "КарлРокс" mean karl rocks in cyrilics isn't it ?
Also google translate seems tu tell me that "керл Рокс" would be a better translation, is that right ?
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
strlen_uri() returns the length of the given URI this function is a modified version of strlen_url in urlapi.c
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
strcpy_uri() copies a url to an output buffer and URL-encodes the URI accordingly this function is a modified version of strcpy_url in urlapi.c
There was a problem hiding this comment.
It's "a URL". The leading U in URL is pronounced as "you" with a starting consonant sound, and thus "a URL" is correct.
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
What do you think about renaming vars to: canonical_uri and canonical_query_str ?
f1b8c64 to
cbd8841
Compare
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
It is not possible to read these functions now without thinking: why does this file need to almost duplicate these functions?
How do they differ and why isn't tweaking the ones in urlapi.c better?
There was a problem hiding this comment.
@outscale-mgo, I think, good idea is taking if(in_query) and default clauses out from case-statement.
case '&' clause should be executing always only in if(in_query) for "without thinking".
There was a problem hiding this comment.
something like:
for(;
*iptr; /* until zero byte */
iptr++) {
if(in_query) {
if(*iptr == '=') {
if(!count_equal++) {
msnprintf(optr, 4, "%%%02X", *iptr);
optr += 3;
} else
*optr++=*iptr;
continue;
} else if (*iptr == '&')
count_equal = 0;
}
switch(*iptr) {
case '%':
*optr++ = *iptr;
if(iptr[1]) {
iptr++;
*optr++ = (char)TOUPPER(*iptr);
if(*iptr != '%' && iptr[1]) {
iptr++;
*optr++ = (char)TOUPPER(*iptr);
}
}
break;
default:
if(urlchar_needs_escaping(*iptr, in_query)) {
msnprintf(optr, 4, "%%%02X", *iptr);
optr += 3;
}
else
*optr++=*iptr;
break;
}
}?
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
I don't understand why it is curl's job to encode the equals signs like this. The user has provided a URL to use. Why would the user provide a wrong one for curl to re-encode like this?
There was a problem hiding this comment.
https://docs.aws.amazon.com/AmazonS3/latest/API/images/sigV4-using-query-params.png
In this document AWS say that you need to encode every parameter and arguments of the query string. problem is that curl seems to contain the whole query string as a single, so I had to find a way to determine which part need to be encode, and which part of need to be left as it.
There was a problem hiding this comment.
@bagder, AWS doesn't recommend using standard UriEncode functions provided by development platform may not work because of differences in implementation and related ambiguity in the underlying RFCs.
There was a problem hiding this comment.
2 small details:
the equal is for example, if an user call something like https://api.region.provider/Call&Argument====&Argument2=2= in which case, if as I've understand correctly, all equal but the first after Argument and Argument2 must be encode. (but this is a corner case, which I'm not sure is even possible)
Also I could patch the functions in url API, but it would add a lot of if/else which might be ugly (for examples curl use lovercase for hexa and AWS require upper case), also as @drTr0jan said, AWS advise to reimplement the encoding.
There was a problem hiding this comment.
That's borderline crazy since a URL is already URL encoded as otherwise it's not a URL. Does the spec also clearly say what bytes it needs to re-encode?
There was a problem hiding this comment.
BTW, doesn't that also imply that you should first URL decode the query string so that you get it truly canonicalized? Otherwise existing URL-encoded characters like %41 might thwart this.
There was a problem hiding this comment.
Now I think about it, I do think the AWS doc is about not yet encoded URL, (so not URL).
also the issue is about using "https://${ES_DOMAIN_ENDPOIN}/tempo_data-*/event/_mapping" instead of "https://${ES_DOMAIN_ENDPOIN}/tempo_data-%2A/event/_mapping".
There was a problem hiding this comment.
though it might still be useful to add an option to force encoding ? (in which case we won't have to deal with already encoded URL)
There was a problem hiding this comment.
I think the issue here is that they want to make the url canonical, and to do that you need to first URL decode and then URL encode because curl always operate on a URL as input and that means it is encoded. So a * could be passed in as a * or as %2a or as %2A. They're all fine according to the URL syntax. But they will then genrate different hashes and that's not what a user wants.
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
This function doesn't "copy a URI" it (re-)encodes a query string according to the aws sig v4 rules.
How about instead call it encode_query() ?
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
This function returns the new length of a query string, it's not really a "uri". strlen_encoded_querry() maybe?
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
BTW, doesn't that also imply that you should first URL decode the query string so that you get it truly canonicalized? Otherwise existing URL-encoded characters like %41 might thwart this.
9d34c53 to
28c0865
Compare
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
so if you have an encoded URL, containing '%2a' for example I can use '%2A' in the canonical header. I detect '%' so I can decode -> encode URL
There was a problem hiding this comment.
What about for example %41 present in the original URL. Shouldn't that rather get decoded to A in a canonical URL?
There was a problem hiding this comment.
@outscale-mgo, but you've detected '%' while case '%' yet, haven't you?
There was a problem hiding this comment.
What about for example
%41present in the original URL. Shouldn't that rather get decoded toAin a canonical URL?
ok I'll do that
There was a problem hiding this comment.
@outscale-mgo, but you've detected '%' while
case '%'yet, haven't you?
yes, because %% is deferent from %01 which represent an ASCII character, anyway I'm going to change the algo to make a full decode -> encode.
f511c01 to
7025344
Compare
lib/curl_setup_once.h
Outdated
There was a problem hiding this comment.
good question, I've take inspiration from what is done in urlapi.c, where TOLOWER is use there.
TBH I don't know what's the advantage over using toupper, I've just assume it was there for a reason, maybe it was wrong, @bagder ?
There was a problem hiding this comment.
You want to use Curl_raw_toupper() and not toupper() to avoid it being affected by locales.
lib/http_aws_sigv4.c
Outdated
There was a problem hiding this comment.
For better understanding it can be like as:
for(ptr = (unsigned char *)url; *ptr; ptr++) {
if (in_query) {
switch (*ptr) {
case '=':
if(count_equal++)
newlen += 2;
newlen++;
continue;
case '&':
count_equal = 0;
}
}
if(urlchar_needs_escaping(*ptr, in_query))
newlen += 2;
newlen++;
}There was a problem hiding this comment.
in this case, if you go into the case '=': it will bug, because, newlen will be increment twice or 4 times.
so the code should be:
for(ptr = (unsigned char *)url; *ptr; ptr++) {
if (in_query) {
switch (*ptr) {
case '=':
if(count_equal++)
newlen += 2;
newlen++;
continue; // replace break by continue here.
case '&':
count_equal = 0;
}
}
if(urlchar_needs_escaping(*ptr, in_query))
newlen += 2;
newlen++;
}TBH I don't think it simplify the code a lot, so I don't think I will change that.
There was a problem hiding this comment.
Of course, continue should be there. Sorry for my typo.
7025344 to
0f55c65
Compare
lib/http_aws_sigv4.c
Outdated
|
|
||
| ascii_code = strtol(num_str, NULL, 16); | ||
| if(ascii_code) { | ||
| *optr++ = (char)ascii_code; |
There was a problem hiding this comment.
CWE 197 advisory: potential loss of information in cast
(at-me in a reply with help or ignore)
There was a problem hiding this comment.
hello an idea for that ?
would an
if (ascii_code > 255)
handle_errror()
be enough ?
There was a problem hiding this comment.
char is often signed, so that typecast is not limiting the value to 0 - 255. And since you want an unsigned value, maybe strtoul() is actually what you want? The fact that char is also not certain to be signed or unsigned is also a reason to use it unsigned explicitly.
The code in escape.c does this:
hex = strtoul(hexstr, &ptr, 16);
in = curlx_ultouc(hex); /* this long is never bigger than 255 anyway */0f55c65 to
13b5c43
Compare
|
This pull request introduces 1 alert when merging 13b5c43557ea6404b10c82260afd43c436d92b3e into 351b181 - view on LGTM.com new alerts:
|
08180ac to
8591bdf
Compare
ee2e49b to
fe3edad
Compare
My bad it seems urlapi does handle query. |
This one I think can probably be fine to provide publicly.
Why not use the decode flag when asking for or setting the query?
This doesn't feel like material for the public API. It's just too weird and specific.
I don't understand this change. |
|
Hello, some update on this PR: I did try to use urlapi but fail to do so (my test is here: https://github.com/outscale-dev/curl/commits/http_aws_sigv4_encoding_urlapi) The major problem I encounter is that aws url encoding, encode characters as '/', '=' and some other reserved characters. Also, it seems that depending of the API use (see here: https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html and here: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html) the number of encoding that must be done is not the same. (double encoding on equal on some parameter and 1 on some other) So I've add an option to specify the number of encoding that should be done, and with 0 encoding, none is done, but can still be done manually, which I'd like to keep, so if there is a bug in the encoding, curl user will still be able to do it themself. Note that presently, only 0 and 1 are supported,. I still need to write test, and improve the man, but as I didn't give feedback since some time, I've push my current work on this branch. Edit: the docs and tests should be done now |
f948538 to
55be259
Compare
55be259 to
f630d83
Compare
Following that: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html note that the documentation say nothing explicitely about '=' encoding, but say that each value and key of the query need to be encoded while the '=' use to slipt the key from the value need to be left as it. as data->state.up.query contain all queries parameters as a single string, I need to determine which '=' are part of a value and which one a "true" equal should fix: curl#7559 Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
add new option aws-sigv4-nb-encoding which is here to tell curl to encode url in aws v4 signature Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
f630d83 to
0933688
Compare
|
Is this still work in progress? It has merge conflicts and I would like to review the code. Please also update the description of this PR so that it is very clear what the purpose of it is. |
hello, I've try to update the PR description, it it's purpose is more clear. I can rebase it in the meantime, if you want, but I would prefer to take more time, so I can add the double encoding support. |
|
We can wait, sure! |
|
This PR just died. Feel free to submit a new/reopen if work resumes! |
It seems AWS signature V4 was lacking part of the algorith, AWSv4 should in theory URL Encode, the data before using it in the algorytme.
problem is that depending of whatever V4 signature is use for s3 or another API, the URL encoding is not the same.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
In one case the data is single encoded, and in the other case, it is double encode.
So the patch add a new option to curl, that enable the user to choose the number of encoding you want. (default is 0)
Here is the changes of this PR:
--aws-sigv4-nb-encodingoption, default to 0data->state.up.queryanddata->state.up.path, if the option is set.some notes:
hostname?=encoding, but say that each value and key of the query need to be encoded, while the=use to slipt the key from the value need to be left as it. asdata->state.up.querycontain all queries parameters as a single string, I need to determine which '=' are part of a value and which one a "true" equalshould fix: #7559 #9717