utility: modified string util class#2368
Conversation
Signed-off-by: Gabriel <gsagula@gmail.com>
…sites Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
|
@jmarantz Let me know if that's what you had in mind ^^^ |
jmarantz
left a comment
There was a problem hiding this comment.
Generally looks great. Just a few style quibbles, except for one functional issue with !trim_whitespace in find().
source/common/common/utility.cc
Outdated
| } | ||
|
|
||
| absl::string_view StringUtil::ltrim(absl::string_view source) { | ||
| source.remove_prefix(std::min(source.find_first_not_of(" \t\f\v\n\r"), source.size())); |
There was a problem hiding this comment.
this is very concise but it kind of exploits the knowledge that npos is very large, which might not be obvious to the reader. Consider:
(a) factoring out your set of whitespace to a static const char[] declared above.
(b) writing an explicit conditional:
const absl::string_view::size_type pos = source.find_first_not_of(WhitespaceChars);
if (pos != absl::string_view::npos) {
source.remove_prefix(pos);
}
There was a problem hiding this comment.
Good call Joshua 👍
Unless removing all the white-spaces of a string that contains only white-spaces is expected, I think that was a bug. Same applies to rtrim I believe.
There was a problem hiding this comment.
Oh ... good point. Actually I'd expect this to clear a string that's got only whitespaces, so you need an else-clause here to make that explicit, and equivalent to your previous form which was correct because you happen to know that npos is large.
There was a problem hiding this comment.
Sounds good! Thanks Joshua, I'll push the fixes later this evening.
source/common/common/utility.cc
Outdated
| } | ||
|
|
||
| absl::string_view StringUtil::rtrim(absl::string_view source) { | ||
| source.remove_suffix(source.size() - source.find_last_not_of(" \t\f\v\n\r") - 1); |
There was a problem hiding this comment.
ditto, and you can share the factored out constant.
source/common/common/utility.cc
Outdated
|
|
||
| bool StringUtil::find(absl::string_view source, absl::string_view delimiters, | ||
| absl::string_view key_token, bool trim_whitespace) { | ||
| const std::vector<absl::string_view> tokens = splitToken(source, delimiters, !trim_whitespace); |
There was a problem hiding this comment.
Why !trim_whitespace? I could understand 'false' there give the comment you made in the other CL about lazily stripping whitespace, but if the caller of this passes in false, you'll then have splitToken strip whitespace, so that looks wrong.
And I think now that we are working with string_view, whitespace stripping is pretty fast; it's probably not worth doing it lazily.
There was a problem hiding this comment.
I thought that would reduce computations when trimming white spaces, but that's actually wrong and you are right, string_view is pretty fast. I will change that.
source/common/common/utility.cc
Outdated
| if (pos != absl::string_view::npos) { | ||
| source.remove_suffix(source.size() - pos); | ||
| } | ||
| return trim_whitespace ? trim(source) : source; |
There was a problem hiding this comment.
maybe just right-trim the source, since you are just cropping right? either way is fine.
There was a problem hiding this comment.
Rtrim seems more correct actually 👍
source/common/common/utility.h
Outdated
| * find("A=5; b", "=;", "A=5") . false | ||
| * find("A=5; b", "=;", "A") . true | ||
| * find("A=5; b", "=;", "b") . true | ||
| * find("", ".", "") . true (spliting "" with "." returns "" which contains "") |
There was a problem hiding this comment.
I still think this is a weird case, and wonder what you have in mind for defining the semantics this way.
There was a problem hiding this comment.
If the source cannot be split by any of the provided delimiters, absl::StrSplit(source, absl::ByAnyChar(delimiters)) returns a vector with one element, which contains the source itself. So, in the above doc find("", ".", "") returns true since ("" equals ""). I guess I should just make the example a bit more clear. Does that make sense?
There was a problem hiding this comment.
OK, sounds good. Let's stick with what you have.
| } | ||
|
|
||
| TEST(StringUtil, StringViewCropRight) { | ||
| EXPECT_EQ("hello", StringUtil::cropRight("hello; world\t\f\v\n\r", ";")); |
source/common/common/utility.cc
Outdated
|
|
||
| absl::string_view StringUtil::trim(absl::string_view source) { return ltrim(rtrim(source)); } | ||
|
|
||
| bool StringUtil::find(absl::string_view source, absl::string_view delimiters, |
There was a problem hiding this comment.
can you rename this findToken? I don't think the overload of find() makes sense.
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this is super awesome. Quick drive by comment.
source/common/common/utility.h
Outdated
| bool keep_empty_string = false); | ||
|
|
||
| /** | ||
| * TODO(gsagula): remove this when call-sites have been replaced by splitToken(). |
There was a problem hiding this comment.
Since you are already doing changes to other code sights, is it possible to just delete this in this PR? I doubt there are that many callers and I'm guessing they are all trivial to convert. I've been meaning to do this also.
There was a problem hiding this comment.
I've already started looking at places where split can be replaced. It looks like that common/network/utility is a great candidate for these changes, but would be nice to also convert some of the functions to fully make use of string view. I would be happy to do that in another small PR as soon as I finish the filter, which I think is almost done I believe, if that's ok.
…ction Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
|
@jmarantz @mattklein123 The old split has been removed, but I will need to look at the CI failure tomorrow. I probably misused string_view in somewhere.
|
jmarantz
left a comment
There was a problem hiding this comment.
This looks great. Thanks for making all these changes. These are all style-nits or nano-performance comments, except for the statically initialized std::string.
source/common/common/utility.cc
Outdated
| source.remove_suffix(source.size() - source.find_last_not_of(" \t\f\v\n\r") - 1); | ||
| const absl::string_view::size_type pos = source.find_last_not_of(WhitespaceChars); | ||
| if (pos != absl::string_view::npos) { | ||
| source.remove_suffix(source.size() - source.find_last_not_of(WhitespaceChars) - 1); |
There was a problem hiding this comment.
source.remove_suffix(source.size() - pos - 1);
source/common/common/utility.cc
Outdated
| .count() != 0; | ||
| } | ||
|
|
||
| const std::string StringUtil::WhitespaceChars{" \t\f\v\n\r"}; |
There was a problem hiding this comment.
const char[StringUtil::WhitespaceChars[] = " \t\f\v\n\r";
avoids the init problem.
There was a problem hiding this comment.
Sorry I keep getting it wrong 🤦♂️
source/common/common/utility.h
Outdated
| class StringUtil { | ||
| public: | ||
| /** | ||
| * White space chars. |
There was a problem hiding this comment.
s/White space/Whitespace/
not clear a comment is necessary at all though :)
source/common/common/utility.h
Outdated
| /** | ||
| * White space chars. | ||
| */ | ||
| static const std::string WhitespaceChars; |
There was a problem hiding this comment.
static const char WhitespaceChars[];
source/common/grpc/common.cc
Outdated
| } | ||
| *service = std::move(parts[0]); | ||
| *method = std::move(parts[1]); | ||
| *service = std::move(std::string(parts[0])); |
There was a problem hiding this comment.
You could also
service->assign(parts[0].data(), parts[0].size());
not sure which is better; either is fine with me. Just thought I'd point out the alternative form.
| } | ||
| entry.string_value_ += line + "\n"; | ||
| if (line == lines.back()) { | ||
| entry.string_value_.append(std::string(StringUtil::rtrim(line))); |
There was a problem hiding this comment.
Here I think it would be faster to append the bytes directly, rather than make a temporary string and then append that.
absl::string_view trimmed = StringUtil::rtrim(line);
entry.string_value_.append(trimmed.data(), trimmed.size());
test/common/common/utility_test.cc
Outdated
| } | ||
| } | ||
|
|
||
| TEST(StringUtil, WhitespaceChars) { EXPECT_EQ(" \t\f\v\n\r", StringUtil::WhitespaceChars); } |
There was a problem hiding this comment.
I'd instead write a 5-line test: one line for each character you want to test.
EXPECT_NE(nullptr, strchr(StringUtil::WhitespaceChars, ' '));
EXPECT_NE(nullptr, strchr(StringUtil::WhitespaceChars, '\t'));
this test wouldn't be dependent on the order of the whitespace characters, which is not important.
| for (const auto& alpn_protocol : alpn_protocols) { | ||
| common_tls_context.add_alpn_protocols(alpn_protocol); | ||
| for (auto alpn_protocol : | ||
| StringUtil::splitToken(json_tls_context.getString("alpn_protocols", ""), ",")) { |
There was a problem hiding this comment.
I have a guess as to the ASAN problem. Can you make a std::string temp for json_tls_context.getString("alpn_protocols", "") outside the for-loop?
The one thing you need to be sure of with string_view is that the backing-store for the chars outlives the uses of the string_view.
There was a problem hiding this comment.
I've done that yesterday, but I think it might be something else. Let me kick the CI off again. Things in my machine are very slow.
mattklein123
left a comment
There was a problem hiding this comment.
This is super awesome. Few random comments. Thank you!
source/common/common/utility.cc
Outdated
| absl::string_view StringUtil::rtrim(absl::string_view source) { | ||
| const absl::string_view::size_type pos = source.find_last_not_of(WhitespaceChars); | ||
| if (pos != absl::string_view::npos) { | ||
| source.remove_suffix(source.size() - source.find_last_not_of(WhitespaceChars) - 1); |
There was a problem hiding this comment.
nit: Does find_last_not_of need to be computed again? Can't you just use pos?
source/common/common/utility.h
Outdated
| class StringUtil { | ||
| public: | ||
| /** | ||
| * White space chars. |
source/common/grpc/common.cc
Outdated
| } | ||
| *service = std::move(parts[0]); | ||
| *method = std::move(parts[1]); | ||
| std::string service_str{parts[0]}; |
There was a problem hiding this comment.
nit: I would kill the local variables. You can assign a new string, or just assign directly into the output variables.
source/common/network/cidr_range.cc
Outdated
| // static | ||
| CidrRange CidrRange::create(const std::string& range) { | ||
| std::vector<std::string> parts = StringUtil::split(range, '/'); | ||
| auto parts = StringUtil::splitToken(range, "/"); |
source/common/network/cidr_range.cc
Outdated
| if (ptr->type() == Type::Ip) { | ||
| uint64_t length64; | ||
| if (StringUtil::atoul(parts[1].c_str(), length64, 10)) { | ||
| std::string part{parts[1]}; |
source/common/network/utility.cc
Outdated
| for (const std::string& s : ranges) { | ||
| std::stringstream ss(s); | ||
| void Utility::parsePortRangeList(absl::string_view string, std::list<PortRange>& list) { | ||
| auto ranges = StringUtil::splitToken(string, ","); |
| StringUtil::split(Filesystem::fileReadToEnd(full_path), "\n"); | ||
| for (const std::string& line : lines) { | ||
| if (!line.empty() && line.at(0) == '#') { | ||
| auto text_file{Filesystem::fileReadToEnd(full_path)}; |
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
mattklein123
left a comment
There was a problem hiding this comment.
Sorry few more random const stuff and 1 comment. Will let @jmarantz do the final +1. Thank you for working on this!
source/common/grpc/common.cc
Outdated
| return false; | ||
| } | ||
| std::vector<std::string> parts = StringUtil::split(path->value().c_str(), '/'); | ||
| auto parts = StringUtil::splitToken(path->value().c_str(), "/"); |
source/common/grpc/common.cc
Outdated
| } | ||
| *service = std::move(parts[0]); | ||
| *method = std::move(parts[1]); | ||
| *service = std::string(parts[0]); |
There was a problem hiding this comment.
Sorry I should have been more explicit, and compiler might optimize, but I would do service->assign here to avoid the temporary. Same in next line.
source/common/http/utility.cc
Outdated
| if (header.key() == Http::Headers::get().Cookie.get().c_str()) { | ||
| // Split the cookie header into individual cookies. | ||
| for (const std::string& s : StringUtil::split(std::string{header.value().c_str()}, ';')) { | ||
| for (auto s : StringUtil::splitToken(header.value().c_str(), ";")) { |
source/common/http/utility.cc
Outdated
| continue; | ||
| } | ||
| std::string k = s.substr(first_non_space, equals_index - first_non_space); | ||
| absl::string_view k = s.substr(first_non_space, equals_index - first_non_space); |
There was a problem hiding this comment.
Sorry for missing all these constants. Late night work :(
absl::string_view v might need to be modified, we can't make it a const:
if (v.size() >= 2 && v.back() == '"' && v[0] == '"') {
v = v.substr(1, v.size() - 2);
}
There was a problem hiding this comment.
Cool no problem, sorry for the invalid comment.
| uint32_t ret = 0; | ||
| std::vector<std::string> retry_on_list = StringUtil::split(config, ','); | ||
| for (const std::string& retry_on : retry_on_list) { | ||
| for (auto retry_on : StringUtil::splitToken(config, ",")) { |
| if (!line.empty() && line.at(0) == '#') { | ||
| const std::string text_file{Filesystem::fileReadToEnd(full_path)}; | ||
| const auto lines = StringUtil::splitToken(text_file, "\n"); | ||
| for (auto line : lines) { |
| } | ||
| entry.string_value_ += line + "\n"; | ||
| if (line == lines.back()) { | ||
| absl::string_view trimmed = StringUtil::rtrim(line); |
Signed-off-by: Gabriel <gsagula@gmail.com>
source/common/common/utility.cc
Outdated
|
|
||
| bool StringUtil::findToken(absl::string_view source, absl::string_view delimiters, | ||
| absl::string_view key_token, bool trim_whitespace) { | ||
| const std::vector<absl::string_view> tokens = splitToken(source, delimiters, false); |
There was a problem hiding this comment.
OK I'll let you make the final call, but IMO you could just do:
const std::vector<absl::string_view>tokens = splitToken(source, delimeters, trim_whitespace);
return std::find(tokens.begin(), tokens.end(), key_token) != tokens.end();
which changes the definition of ("", ",", "") to give you a 'false'. If you really like spelling out the code like this I can accept it, but anyone reading will wonder why this is so complicated.
| std::unique_ptr<lightstep::LightStepTracerOptions> opts(new lightstep::LightStepTracerOptions()); | ||
| opts->access_token = server.api().fileReadToEnd(json_config.getString("access_token_file")); | ||
| StringUtil::rtrim(opts->access_token); | ||
| opts->access_token = std::string( |
There was a problem hiding this comment.
save rtrim() result in a temp and then use opts->access_token.assign(...date(), ...size()) to avoid the temp.
test/test_common/utility.cc
Outdated
| size_t next_index; | ||
|
|
||
| if (split.empty()) { | ||
| ret.emplace_back(source); |
There was a problem hiding this comment.
if (keep_empty_string) {
ret.emplace_back(source);
}
test/test_common/utility.cc
Outdated
| } | ||
|
|
||
| last_index = next_index + split.size(); | ||
| } while (next_index != source.size()); |
There was a problem hiding this comment.
next_index < source.size()
it's hard for me to be sure that you won't shoot past source.size() given that you assign next_index to source.size() on line 177, and then ast last_index=next_index+split.size() above.
this makes me wonder how well this function is tested. Maybe you could redefine it in terms of absl::StrSplit and then convert the result to an array of std::string?
There was a problem hiding this comment.
This is a test function. Can't we just call the string_view version and then just convert the result into an array of strings?
There was a problem hiding this comment.
This is the old split function. I just moved it into common_test_lib for convenience when we are writing tests. I can try to call the string_view version and then transform the result.
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
jmarantz
left a comment
There was a problem hiding this comment.
one last cleanup if you feel like it.
Thanks for pushing this all the way through! These will be great and can help simplify some other code as well.
| } | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Now that you are passing trim_whitespace through, you can just delete lines 99-106
There was a problem hiding this comment.
If we do that, split strings will NOT be trimmed and findToken will produce a completely different result.
auto s = StringUtil::splitToken("abc; type=text", ";=", "type", true);
s[0] == "abc", s[1] == " type", s[2] == "text"
Note that s[1] would still need to be trimmed in order to have an exactly match of word type. As I understand, the only difference between the functors SkipEmpty() and SkipWhitespace(), is that SkipWhitespace returns false if a string contains ONLY whitespaces.
I'm not sure that we really want that.
There was a problem hiding this comment.
OK that's fine then. I'm confused though: https://github.com/abseil/abseil-cpp/blob/be40fdf1a86f4956d2f8125a7b6bd6d34e133c2d/absl/strings/str_split.h#L358 looks like it has the expected behavior, no?
There was a problem hiding this comment.
If you look at the example that you just gave me, v[0] has leading and trailing whitespaces, which means that in order to match it with "a", both sides would have to be trimmed. Right?
v[0] == " a ", v[1] == "b"
There was a problem hiding this comment.
My apologies; you are absolutely correct. I totally misread the str_split.h.
There was a problem hiding this comment.
No worries Joshua! Thanks for the awesome code review!
|
No problem! Thank you and Matt for reviewing it. |
jmarantz
left a comment
There was a problem hiding this comment.
I'm guessing your mac CI test failed due to a flake unrelated to this PR (pasted below). Close and open the PR to re-run it.
[ RUN ] IpVersions/Http2UpstreamIntegrationTest.ManySimultaneousRequest/1
TestRandomGenerator running with seed 2010966488
test/integration/http2_upstream_integration_test.cc:274: Failure
Expected equality of these values:
"503"
responses[i]->headers().Status()->value().c_str()
Which is: "504"
test/integration/http2_upstream_integration_test.cc:270: Failure
Expected equality of these values:
"200"
responses[i]->headers().Status()->value().c_str()
Which is: "504"
test/integration/http2_upstream_integration_test.cc:271: Failure
Expected equality of these values:
response_bytes[i]
Which is: 439
responses[i]->body().length()
Which is: 24
|
I will just merge, don't worry about OSX failure. |
Adds a benchmarking for new string utility functions added in #2368 using the google microbenchmarking library (https://github.com/google/benchmark), which was added in #2350. This serves as an example and proof of concept on how to use the benchmarking library, and as a reference point for the approximate performance of some string-parsing functions. Risk Level: Very Low Testing: This is just a new speed-test, so running it on its own is sufficient to test it. Results: 2018-01-18 08:01:44 Run on (12 X 3800 MHz CPU s) CPU Caches: L1 Data 32K (x6) L1 Instruction 32K (x6) L2 Unified 256K (x6) L3 Unified 15360K (x1) ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------- BM_RTrimStringView 11 ns 11 ns 54467092 BM_RTrimStringViewAlreadyTrimmed 9 ns 9 ns 77143046 BM_RTrimStringViewAlreadyTrimmedAndMakeString 24 ns 24 ns 29094696 BM_FindToken 165 ns 165 ns 4230849 BM_FindTokenValueNestedSplit 427 ns 427 ns 1653627 BM_FindTokenValueSearchForEqual 180 ns 180 ns 4046401 In practice I did not find this benchmark to be noisy on repeated runs. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Automatic merge from submit-queue. Update envoy to 01ee725 **What this PR does / why we need it**: Update envoy to envoyproxy/envoy@01ee725 **Special notes for your reviewer**: Related Envoy changes: envoyproxy/envoy#2368 envoyproxy/envoy#2376 **Release note**: ```release-note None ```
Description:
This PR replaces the old rtrim static member function with a new implementation
that uses absl::string_view to reduce memory allocation and to increase performance. It also replaces all the call-sites for fully elimination of the older version of rtrim. The same PR adds few new member functions that follows the same design philosophy of the new rtrim. All the improvements made to StringUtil class helps to support #2348 and #2087.
Risk Level: High
Few changes where made to internal implementation of proxy_protocol.cc, runtime_impl.cc
and lightstep_httptracer.cc in order to eliminate the old rtrim function. Even though changes
are quite simple, it's important to pay close attention when reviewing this PR.
Testing: Unit and manual testing.