-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Prepare string and net utils for future HTTP operations #34242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34242. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
src/netbase.cpp
Outdated
| CService addr_bind; | ||
| struct sockaddr_storage sockaddr_bind; | ||
| socklen_t sockaddr_bind_len = sizeof(sockaddr_bind); | ||
| if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetSockName() currently forwards the raw getsockname() return value (0 on success), which forces callers to use an inverted check (!GetSockName(...)).
Can we normalize it so it returns true on success?
bool Sock::GetSockName(sockaddr* name, socklen_t* name_len) const
{
return getsockname(m_socket, name, name_len) == 0;
}
It would make the code slightly more readable and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, but this might be better addressed in a separate PR. The entire Sock class wraps POSIX socket management syscalls that return 0 on success (connect, bind, listen, close, etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind the Sock methods is that they are mockable versions of the OS functions. Zero learning curve for newcomers (assuming familiarity with the OS interface). Sock::Foo(x, y, z) is equivalent to calling the OS's foo(x, y, z).
inverted check
the function returns an integer error code, not a boolean. A check like if (foo() == 0) looks better (less "inverted").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks @vasild. It would be nice to have more documentation on the return codes then. And +1 on the == 0 suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line gets "modernized" in the next commit, including a comment and == 0, so I think it's cool for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returns an integer error code" -- oh, I meant this:
https://www.man7.org/linux/man-pages/man2/getsockname.2.html#RETURN_VALUE
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno
is set to indicate the error.
or the check could be if (foo() != -1).
|
ACK ea993ff I reviewed the same 6 commits as part of #32061 and all my feedback from there has been addressed. |
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ea993ff
src/netbase.cpp
Outdated
| sockaddr_storage storage; | ||
| socklen_t len = sizeof(storage); | ||
|
|
||
| auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner cast is unnecessary?
| auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage)); | |
| auto sa = static_cast<sockaddr*>(&storage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works with a reinterpret_cast because sockaddr_storage and sockaddr are technically unrelated types even though they overlap... I'll clean it up, thanks. I think I had static_cast there before because of hints from Sonarcloud on corecheck.dev.
src/util/string.cpp
Outdated
| // that means the line we are currently reading is too long, and we throw. | ||
| if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader"); | ||
| } | ||
| const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count); | |
| const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken, thanks
| std::string FormatRFC1123DateTime(int64_t time) | ||
| { | ||
| if (time < -62167219200 || 253402300799 < time) { | ||
| // RFC7231 mandates 4-digit year, so only support years 0 to 9999 | ||
| return ""; | ||
| } | ||
| const std::chrono::sys_seconds secs{std::chrono::seconds{time}}; | ||
| const auto days{std::chrono::floor<std::chrono::days>(secs)}; | ||
| const auto w{days.time_since_epoch().count() % 7}; // will be in the range [-6, 6] | ||
| std::string_view weekday{weekdays.at(w >= 0 ? w : w + 7)}; | ||
| const std::chrono::year_month_day ymd{days}; | ||
| std::string_view month{months.at(unsigned{ymd.month()} - 1)}; | ||
| const std::chrono::hh_mm_ss hms{secs - days}; | ||
| // examples: Mon, 27 Jul 2009 12:28:53 GMT | ||
| // Fri, 31 May 2024 19:18:04 GMT | ||
| return strprintf("%03s, %02u %03s %04i %02i:%02i:%02i GMT", weekday, unsigned{ymd.day()}, month, signed{ymd.year()}, hms.hours().count(), hms.minutes().count(), hms.seconds().count()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think C++20 has a short version of this (no extra arrays needed):
std::string FormatRFC1123DateTime(int64_t time)
{
if (time < -62167219200 || 253402300799 < time) {
// RFC7231 mandates 4-digit year, so only support years 0 to 9999
return "";
}
const std::chrono::sys_seconds secs{std::chrono::seconds{time}};
// Example: "Mon, 27 Jul 2009 12:28:53 GMT"
return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
}maybe you already tried it and it doesn't work on all our supported platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be locale-independent. I think it is:
https://en.cppreference.com/w/cpp/chrono/system_clock/formatter.html#Format_specification
the default "C" locale if L is not present in the format specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105398?pr=34242#step:11:1480
/home/admin/actions-runner/_work/_temp/src/util/time.cpp:129:17: error: no member named 'format' in namespace 'std'
129 | return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
| ~~~~~^
😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, will revert on next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too good to be true
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK ea993ff
src/util/string.cpp
Outdated
| while (it != end) { | ||
| // Read a character from the incoming buffer and increment the iterator | ||
| auto c = static_cast<char>(*it); | ||
| ++it; | ||
| ++count; | ||
| // If the character we just consumed was \n, the line is terminated | ||
| if (c == '\n') break; | ||
| // If we are at the end of the incoming buffer, the line is terminated | ||
| if (it == end) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ea993ff:
Seems you can remove this last line. You already do a while (it != end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Youre right thanks, its gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is incremented inside the body of the loop, after the check in while (it != end). So after the increment it could be equal to end.
const std::vector<std::byte> input{StringToBuffer("ab")};
LineReader reader(input, /*max_read=*/1);
std::optional<std::string> line{reader.ReadLine()};Before this change the above snippet would have set line to ab and after the change it throws std::runtime_error: max_read exceeded by LineReader. I guess the new behavior is the correct? Maybe worth adding such a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vasild I added this as a test case, and am scratching my head how it slipped through. Removing the line is not only cleaner code, it is a bug fix. LineReader should not return a line longer than max_read
src/util/string.cpp
Outdated
| // If the character we just consumed was \n, the line is terminated | ||
| if (c == '\n') break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit:
could make the termination character customizable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to leave this for now, next dev who needs a custom EOL character can add it in the future ;-)
The function logic is moved-only from net.cpp to netbase.cpp and redeclared (as non-static) in netbase.h
Replace the C-style casting with C++ reinterpret_cast
https://httpwg.org/specs/rfc9110.html#rfc.section.5.1 Field names in HTTP headers are case-insensitive. These structs will be used in the headers map to search by key. In libevent field names are also converted to lowercase for comparison: evhttp_find_header() evutil_ascii_strcasecmp() EVUTIL_TOLOWER_()
HTTP 1.1 responses require a timestamp header with a format specified (currently) by: https://datatracker.ietf.org/doc/html/rfc9110#section-5.6.7 This specific format is defined in RFC1123: https://www.rfc-editor.org/rfc/rfc1123#page-55 The libevent implementation can be referenced in evutil_time.c evutil_date_rfc1123()
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1d24816
|
switching to draft for today: Im going to see if we can just use |
|
Ok did some learning about Here's my reasoning (up for debate, @furszy ): Raw data is copied as
|
|
re-ACK 1d24816 |
| * @returns a string of the expected length. | ||
| * @throws a std::runtime_error if there is not enough data in the buffer. | ||
| */ | ||
| std::string ReadLength(size_t len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit:
Would just call this Read(size_t len).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna push back on this, I think in a class called "Line Reader" it's important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.
src/util/string.h
Outdated
| /** | ||
| * Returns remaining size of bytes in buffer | ||
| */ | ||
| size_t Left() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nano nit:
Maybe Left() is not the best term. Could use Remaining()?. But at this point it doesn't matter much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sure, taking this. 🙏
| std::optional<std::string> LineReader::ReadLine() | ||
| { | ||
| if (it == end) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| auto line_start = it; | ||
| size_t count = 0; | ||
| while (it != end) { | ||
| // Read a character from the incoming buffer and increment the iterator | ||
| auto c = static_cast<char>(*it); | ||
| ++it; | ||
| ++count; | ||
| // If the character we just consumed was \n, the line is terminated. | ||
| // The \n itself does not count against max_read. | ||
| if (c == '\n') break; | ||
| // If the character we just consumed gives us a line length greater | ||
| // than max_read, and we are not at the end of the line (or buffer) yet, | ||
| // that means the line we are currently reading is too long, and we throw. | ||
| if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader"); | ||
| } | ||
| const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count); | ||
| const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n | ||
| return std::string(line); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we advance the class member iterator it while scanning the span, throwing a max_read exception drops the bytes already read. We should advance it only at the very end to avoid loosing them (a test for this case would be good).
Also, small extra nit: what if there only char is a \n? The returned string will be empty?
As a separate topic, could use std::memchr to find the first instance of \n which would avoid the loop (haven't fully tried it but seems to be an option as well - dropping it in case you want to investigate it, all good if not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch thanks. I now reset the iterator just before throwing. Tested by catching the "exceeded" error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.
Also added test cases for what makes the most sense to me when using the tool:
- If the buffer is only
"\n"then you get one empty""line,nulloptafter that - If the buffer is completely empty
""you getnullopt(not an error)
re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.
This is a helper struct to parse HTTP messages from data in buffers from sockets. HTTP messages begin with headers which are CRLF-terminated lines (\n or \r\n) followed by an arbitrary amount of body data. Whitespace is trimmed from the field lines but not the body. https://httpwg.org/specs/rfc9110.html#rfc.section.5
pinheadmz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::optional<std::string> LineReader::ReadLine() | ||
| { | ||
| if (it == end) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| auto line_start = it; | ||
| size_t count = 0; | ||
| while (it != end) { | ||
| // Read a character from the incoming buffer and increment the iterator | ||
| auto c = static_cast<char>(*it); | ||
| ++it; | ||
| ++count; | ||
| // If the character we just consumed was \n, the line is terminated. | ||
| // The \n itself does not count against max_read. | ||
| if (c == '\n') break; | ||
| // If the character we just consumed gives us a line length greater | ||
| // than max_read, and we are not at the end of the line (or buffer) yet, | ||
| // that means the line we are currently reading is too long, and we throw. | ||
| if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader"); | ||
| } | ||
| const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count); | ||
| const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n | ||
| return std::string(line); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch thanks. I now reset the iterator just before throwing. Tested by catching the "exceeded" error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.
Also added test cases for what makes the most sense to me when using the tool:
- If the buffer is only
"\n"then you get one empty""line,nulloptafter that - If the buffer is completely empty
""you getnullopt(not an error)
re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.
| * @returns a string of the expected length. | ||
| * @throws a std::runtime_error if there is not enough data in the buffer. | ||
| */ | ||
| std::string ReadLength(size_t len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna push back on this, I think in a class called "Line Reader" it's important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.
src/util/string.h
Outdated
| /** | ||
| * Returns remaining size of bytes in buffer | ||
| */ | ||
| size_t Left() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sure, taking this. 🙏
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 9b3612b
| const size_t max_read; | ||
| std::span<const std::byte>::iterator it; | ||
|
|
||
| explicit LineReader(std::span<const std::byte> buffer, size_t max_read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
As we can read pass max_read limit if max_read + 1 is a jump line, I would rename the arg and member to max_output_length.
|
re-ACK 9b3612b |
This is a component of removing libevent as a dependency of the project. It is the first six commits of #32061 and provides a string-parsing utility (
LineReader) that is also consumed by #34158.These are the functions that are added / updated for HTTP and Torcontrol:
GetBindAddress(): Given a socket, provides the bound address as a CService. Currently used by p2p but moved fromnettonetbaseso other modules can call it.ToIntegral(): Already used to parse numbers from strings, added new argumentbase = 10so it can also be used to parse hexadecimal integers. HTTP chunked transfer-encoding uses hex-encoded integers to specify payload size: https://datatracker.ietf.org/doc/html/rfc7230.html#section-4.1AsciiCaseInsensitivecomparators: Needed to store HTTP headers in anunordered_map. Headers are key-value pairs that are parsed with case-insensitive keys: https://httpwg.org/specs/rfc9110.html#rfc.section.5.1FormatRFC1123DateTime(): The required datetime format for HTTP headers (e.g.Fri, 31 May 2024 19:18:04 GMT)LineReader: Fields in HTTP requests are newline-terminated. This struct is given an input buffer and provides methods to read lines as strings.