http: Add ability to merge slashes#7621
Conversation
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
|
/retest |
|
🔨 rebuilding |
|
/assign @jmarantz could you have a look? |
|
🙀 Error while processing event: |
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
venilnoronha
left a comment
There was a problem hiding this comment.
Thanks for working on this.
api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.proto
Outdated
Show resolved
Hide resolved
| is_valid_path = PathUtil::canonicalPath(*request_headers.Path()); | ||
| } | ||
| return true; | ||
| if (config.shouldMergeSlashes()) { |
There was a problem hiding this comment.
Why's the path validity checked on the original value i.e. prior to merging slashes? I'd add a comment explaining that.
There was a problem hiding this comment.
why bother to merge slashes if is_valid_path is false?
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
|
@venilnoronha Please have another look, and let me know if there is any outstanding feedback. |
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
|
Can this change be merged, please? |
|
@euroelessar a maintainer has to review it. Hopefully @jmarantz can take a look soon. |
| is_valid_path = PathUtil::canonicalPath(*request_headers.Path()); | ||
| } | ||
| return true; | ||
| if (config.shouldMergeSlashes()) { |
There was a problem hiding this comment.
why bother to merge slashes if is_valid_path is false?
source/common/http/path_utility.cc
Outdated
| return true; | ||
| } | ||
|
|
||
| /* static */ |
There was a problem hiding this comment.
I don't think we use this style of commenting that a method is static in Envoy.
source/common/http/path_utility.cc
Outdated
| for (size_t i = 1; i < original_path.size(); ++i) { | ||
| if (original_path[i] == '/' && original_path[i - 1] == '/') { | ||
| has_adjacent_slashes = true; | ||
| break; |
There was a problem hiding this comment.
capture start-index here to avoid re-searching the same text.
Better yet, use more absl strings magic, to simplify the code
absl::string_view::size_type query_start = orignal_path.find('?');
absl::string_view query;
if (query_start != absl::string_view::npos) {
query = original_path.substr(query_start);
original_path = original_path.substr(0, query_start);
}
if (original_path.find("//") != absl::string_view::npos) {
path_header.value(absl::StrCat(absl::StrReplace(original_path, {{"//", "/"}}), query));
}
WDYT? I'm not 100% sure about all the corner cases like an input of "///" or "////" and what they should do.
There was a problem hiding this comment.
Unfortunately absl::StrReplace-based approach does not handle triple (or more) slashes, but I've simplified the rest of the code based on your recommendation.
| const std::vector<std::pair<std::string, std::string>> non_normal_pairs{ | ||
| {"", ""}, // empty | ||
| {"/a", "/a"}, // no-op | ||
| {"//a/b/c", "/a/b/c"}, // double / start |
There was a problem hiding this comment.
include some triple and quadruple slashes in your test suite.
| auto& path_header = pathHeaderEntry(path_pair.first); | ||
| PathUtil::mergeSlashes(path_header); | ||
| EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
| << "original path: " << path_pair.first; |
There was a problem hiding this comment.
WDYT of turning the body of the loop into a lambda or test-helper method std::string normalizePath(absl::string_view path) and then writing:
EXPECT_EQ("/a/b/c", normalizePath("//a/b/c"));
what you have is idiomatic for golang but most C++ tests I see look more like my suggestion. I'm OK with what you have; just an idea.
| EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
| << "original path: " << path_pair.first; | ||
| } | ||
| } |
There was a problem hiding this comment.
include a test method that also ensures that no slash-merging takes place when the option is off.
There was a problem hiding this comment.
It's covered by ConnectionManagerUtilityTest::MergeSlashesDefaultOff test
There was a problem hiding this comment.
I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.
There was a problem hiding this comment.
Test for handling of 3 or more slashes was already added in the latest revision
|
Thanks for doing this, and thanks @venilnoronha for the first review. Just a few nits/cleanups and we should be good to go. |
source/common/http/path_utility.cc
Outdated
| } | ||
| if (!has_adjacent_slashes) { | ||
| // Only operate on path component in URL. | ||
| const size_t query_start = original_path.find('?'); |
There was a problem hiding this comment.
s/size_t/absl::string_view::size_type
source/common/http/path_utility.cc
Outdated
| // Only operate on path component in URL. | ||
| const size_t query_start = original_path.find('?'); | ||
| const auto path = original_path.substr(0, query_start); | ||
| const auto query = absl::ClippedSubstr(original_path, query_start); |
There was a problem hiding this comment.
Handling the npos case for query_start does not appear to be a documented use of substr etc.
There was a problem hiding this comment.
Point 24.4.2. of C++17 standard specifies that std::base_string_view::substr lets rlen be the smaller of n and size() - pos, which covers npos.
absl::string_view is API-compatible with std::string_view so I assume it inherits the spec behavior even if it's not called out in the documentation explicitly.
source/common/http/path_utility.cc
Outdated
| if (!has_adjacent_slashes) { | ||
| // Only operate on path component in URL. | ||
| const size_t query_start = original_path.find('?'); | ||
| const auto path = original_path.substr(0, query_start); |
There was a problem hiding this comment.
use explicit types in this context as type is not obvious from context unless you know string_view well.
source/common/http/path_utility.cc
Outdated
| simplified_path.reserve(original_path.size()); | ||
| for (size_t i = 0; i < original_path.size(); ++i) { | ||
| if (i > 0 && original_path[i] == '/' && original_path[i - 1] == '/') { | ||
| for (size_t i = 0; i < path.size(); ++i) { |
There was a problem hiding this comment.
I'd still prefer using the absl functions for this to avoid having to look carefully at string-hacking code for common patterns. But absl::Substitute is actually preferred for strings known at compile-time. Just call it in a loop till there are no more occurrences of "//" (change the 'if (path.find" to a while-loop, reversing the polarity of the condition.
There was a problem hiding this comment.
I thought about this further as my 'while' suggestion would be n^2 on a string with 100 consecutive slashes.
Let's do the initial scan for "//" via if, but then we can just use split/join to remove duplicate slashes. WDYT?
return absl::StrCat(absl::StrJoin(absl::StrSplit(original_path, "/"), absl::SkipEmpty), "/"), query);
There was a problem hiding this comment.
In this case we would also need to either special-case / prefix or assume that all paths are always absolute.
There was a problem hiding this comment.
Yeah also trailing slash, so cases to handle are:
/a//b
/a//b/
a//b
a//b/
It's still probably simpler than the character-at-a-time scan but maybe there's a better absl util to use?
There was a problem hiding this comment.
probably good to add those cases anyway to the test-suite regardless of the algo we use
There was a problem hiding this comment.
updated, as well as added extra tests
| << "original path: " << path_pair.first; | ||
| } | ||
| auto sanitized_path_value = path_header.value().getStringView(); | ||
| return std::string(sanitized_path_value.begin(), sanitized_path_value.end()); |
There was a problem hiding this comment.
return std::string(path_header.value().getStringView());
| EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
| << "original path: " << path_pair.first; | ||
| } | ||
| } |
There was a problem hiding this comment.
I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.
| // Determines if adjacent slashes in the path are merged into one before any processing of | ||
| // requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
| // setting this option incoming requests with path `//dir/file` will not match against route with | ||
| // `prefix` match set to `/dir`. Defaults to `false`. |
There was a problem hiding this comment.
can you point to the HTTP spec indicating that multiple slashes should be merged? I suspect such a spec for canonicalized URL paths exists -- same place where it discusses handling of "..".
This should referenced in the comments or maybe even in this doc.
There was a problem hiding this comment.
HTTP spec does not specify that slashes should be merged, moreover it explicitly states that behavior around slashes is resource-specific (and this is one of the reasons to make it optional, and not enable by default).
However it's useful in real world applications to provide a safe-guard against user errors, when we know for sure that the intended behavior is similar to filesystem one.
There was a problem hiding this comment.
Indeed it doesn't. It does discuss treatment of ".." which Envoy does, but this is AFAICT not a specified behavior. https://tools.ietf.org/html/rfc3986#section-6
WDYT of making this a filter rather than putting it in the core, since it's really an optional extension? I think it's possible for a filter to adjust the path. I don't feel too strongly either way, but with @mattklein123 we've been talking about building a bare-bones Envoy with only the core functionality and if this were a filter in test/extensions/filters/http/... it would be easier to compile it out. Not that it's that expensive, but it might make sense to keep the core in-spec and as small as possible.
There was a problem hiding this comment.
I don't have a strong preference here and can follow a maintainers' decision.
One thing to consider is that until #6602 is resolved invalidating routing decision is somehow expensive. (and we have to redo routing on a path change)
There was a problem hiding this comment.
IMO it's fine to have this type of thing be built in as an option. @alyssawilk @PiotrSikora any objection?
There was a problem hiding this comment.
My only concern is the slippery slope of too many knobs.
I'm going to claim this passes the two company test, as we elide at least initial multiple slashes ourselves and my guess is we can get away with eliding all but we should either make sure things pass the multi-user test or consider making them pluggable.
jmarantz
left a comment
There was a problem hiding this comment.
basically lgtm modulo the question about whether this should be a filter or core functionality, which I think can be part of the @envoyproxy/senior-maintainers review.
Thanks for all the updates :)
| * config: async data access for local and remote data source. | ||
| * config: changed the default value of :ref:`initial_fetch_timeout <envoy_api_field_core.ConfigSource.initial_fetch_timeout>` from 0s to 15s. This is a change in behaviour in the sense that Envoy will move to the next initialization phase, even if the first config is not delivered in 15s. Refer to :ref:`initialization process <arch_overview_initialization>` for more details. | ||
| * http: added the ability to reject HTTP/1.1 requests with invalid HTTP header values, using the runtime feature `envoy.reloadable_features.strict_header_validation`. | ||
| * http: added the ability to :ref:`merge adjacent slashes<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.merge_slashes>` in the path. |
There was a problem hiding this comment.
it's probably worth noting here that this canonicalization is not mentioned in any HTTP spec, but is offered as an opt-in convenience to upstreams.
mattklein123
left a comment
There was a problem hiding this comment.
LGTM on the general approach with 1 comment. Please also merge master to pick up the new CI config. Thank you!
/wait
| // requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
| // setting this option incoming requests with path `//dir/file` will not match against route with | ||
| // `prefix` match set to `/dir`. Defaults to `false`. | ||
| google.protobuf.BoolValue merge_slashes = 33; |
There was a problem hiding this comment.
This can be a simple bool type as it defaults to false.
| // Determines if adjacent slashes in the path are merged into one before any processing of | ||
| // requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
| // setting this option incoming requests with path `//dir/file` will not match against route with | ||
| // `prefix` match set to `/dir`. Defaults to `false`. |
There was a problem hiding this comment.
IMO it's fine to have this type of thing be built in as an option. @alyssawilk @PiotrSikora any objection?
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
* use bool Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this LGTM but will wait until Monday when @alyssawilk is back in the office to see if she or @PiotrSikora have any comments. Nice work!
alyssawilk
left a comment
There was a problem hiding this comment.
Looks great! A few extra thoughts from my pass
| return true; | ||
| } | ||
|
|
||
| void PathUtil::mergeSlashes(HeaderEntry& path_header) { |
There was a problem hiding this comment.
I'm a bit twitchy about full qualified URLs here. Granted, Envoy currently handles fully qualified urls in the H1 codec by splitting them up, so if the incoming request is GET http://foo.com//bar we'll have :authority foo.com and :path /bar so this is fine today (we won't end up with http:/foo.com/bar).
It might be worth an assert that the path is relative, just in case someone does some lua or use defined headers trying to take advantage of Path == url in firstline and having their absolute URL messed up.
api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.proto
Outdated
Show resolved
Hide resolved
| // Determines if adjacent slashes in the path are merged into one before any processing of | ||
| // requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
| // setting this option incoming requests with path `//dir/file` will not match against route with | ||
| // `prefix` match set to `/dir`. Defaults to `false`. |
There was a problem hiding this comment.
My only concern is the slippery slope of too many knobs.
I'm going to claim this passes the two company test, as we elide at least initial multiple slashes ourselves and my guess is we can get away with eliding all but we should either make sure things pass the multi-user test or consider making them pluggable.
| } | ||
| const absl::string_view prefix = absl::StartsWith(path, "/") ? "/" : absl::string_view(); | ||
| const absl::string_view suffix = absl::EndsWith(path, "/") ? "/" : absl::string_view(); | ||
| path_header.value(absl::StrCat( |
There was a problem hiding this comment.
I assume this is O(n) if we get something whacky like a path which is 16k worth of / yeah?
There was a problem hiding this comment.
I don't see anything explicit in the absl documentation, but it's O(n) based on the code.
* update example in docs Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
16c08c4
|
/azp run envoy-macos |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description: Add an optional ability to merge slashes in the request's path
Risk Level: LOW (opt-in)
Testing: unit tests
Docs Changes: inline protobuf documentation
Release Notes: updated
Fixes #7611