Skip to content

Conversation

@pinheadmz
Copy link
Member

This is a component of removing libevent as a dependency of the project. It is the first six commits of #32061 and provides a string-parsing utility (LineReader) that is also consumed by #34158.

These are the functions that are added / updated for HTTP and Torcontrol:

  • GetBindAddress(): Given a socket, provides the bound address as a CService. Currently used by p2p but moved from net to netbase so other modules can call it.
  • ToIntegral(): Already used to parse numbers from strings, added new argument base = 10 so it can also be used to parse hexadecimal integers. HTTP chunked transfer-encoding uses hex-encoded integers to specify payload size: https://datatracker.ietf.org/doc/html/rfc7230.html#section-4.1
  • AsciiCaseInsensitive comparators: Needed to store HTTP headers in an unordered_map. Headers are key-value pairs that are parsed with case-insensitive keys: https://httpwg.org/specs/rfc9110.html#rfc.section.5.1
  • FormatRFC1123DateTime(): The required datetime format for HTTP headers (e.g. Fri, 31 May 2024 19:18:04 GMT)
  • LineReader: Fields in HTTP requests are newline-terminated. This struct is given an input buffer and provides methods to read lines as strings.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 9, 2026

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34242.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK furszy, fjahr
Stale ACK vasild

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34158 (torcontrol: Remove libevent usage by fjahr)
  • #32061 (Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)
  • #31672 (rpc: add cpu_load to getpeerinfo by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/netbase.cpp Outdated
CService addr_bind;
struct sockaddr_storage sockaddr_bind;
socklen_t sockaddr_bind_len = sizeof(sockaddr_bind);
if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSockName() currently forwards the raw getsockname() return value (0 on success), which forces callers to use an inverted check (!GetSockName(...)).

Can we normalize it so it returns true on success?

bool Sock::GetSockName(sockaddr* name, socklen_t* name_len) const
{
    return getsockname(m_socket, name, name_len) == 0;
}

It would make the code slightly more readable and maintainable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, but this might be better addressed in a separate PR. The entire Sock class wraps POSIX socket management syscalls that return 0 on success (connect, bind, listen, close, etc...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind the Sock methods is that they are mockable versions of the OS functions. Zero learning curve for newcomers (assuming familiarity with the OS interface). Sock::Foo(x, y, z) is equivalent to calling the OS's foo(x, y, z).

inverted check

the function returns an integer error code, not a boolean. A check like if (foo() == 0) looks better (less "inverted").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks @vasild. It would be nice to have more documentation on the return codes then. And +1 on the == 0 suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line gets "modernized" in the next commit, including a comment and == 0, so I think it's cool for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"returns an integer error code" -- oh, I meant this:

https://www.man7.org/linux/man-pages/man2/getsockname.2.html#RETURN_VALUE

RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno
is set to indicate the error.

or the check could be if (foo() != -1).

@fjahr
Copy link
Contributor

fjahr commented Jan 10, 2026

ACK ea993ff

I reviewed the same 6 commits as part of #32061 and all my feedback from there has been addressed.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ea993ff

src/netbase.cpp Outdated
sockaddr_storage storage;
socklen_t len = sizeof(storage);

auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner cast is unnecessary?

Suggested change
auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage));
auto sa = static_cast<sockaddr*>(&storage);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works with a reinterpret_cast because sockaddr_storage and sockaddr are technically unrelated types even though they overlap... I'll clean it up, thanks. I think I had static_cast there before because of hints from Sonarcloud on corecheck.dev.

// that means the line we are currently reading is too long, and we throw.
if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
}
const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count);
const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken, thanks

Comment on lines 124 to 140
std::string FormatRFC1123DateTime(int64_t time)
{
if (time < -62167219200 || 253402300799 < time) {
// RFC7231 mandates 4-digit year, so only support years 0 to 9999
return "";
}
const std::chrono::sys_seconds secs{std::chrono::seconds{time}};
const auto days{std::chrono::floor<std::chrono::days>(secs)};
const auto w{days.time_since_epoch().count() % 7}; // will be in the range [-6, 6]
std::string_view weekday{weekdays.at(w >= 0 ? w : w + 7)};
const std::chrono::year_month_day ymd{days};
std::string_view month{months.at(unsigned{ymd.month()} - 1)};
const std::chrono::hh_mm_ss hms{secs - days};
// examples: Mon, 27 Jul 2009 12:28:53 GMT
// Fri, 31 May 2024 19:18:04 GMT
return strprintf("%03s, %02u %03s %04i %02i:%02i:%02i GMT", weekday, unsigned{ymd.day()}, month, signed{ymd.year()}, hms.hours().count(), hms.minutes().count(), hms.seconds().count());
}
Copy link
Member

@furszy furszy Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think C++20 has a short version of this (no extra arrays needed):

std::string FormatRFC1123DateTime(int64_t time)
{
    if (time < -62167219200 || 253402300799 < time) {
        // RFC7231 mandates 4-digit year, so only support years 0 to 9999
        return "";
    }
    const std::chrono::sys_seconds secs{std::chrono::seconds{time}};
    // Example: "Mon, 27 Jul 2009 12:28:53 GMT"
    return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
}

maybe you already tried it and it doesn't work on all our supported platforms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be locale-independent. I think it is:

https://en.cppreference.com/w/cpp/chrono/system_clock/formatter.html#Format_specification

the default "C" locale if L is not present in the format specification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105398?pr=34242#step:11:1480

/home/admin/actions-runner/_work/_temp/src/util/time.cpp:129:17: error: no member named 'format' in namespace 'std'
  129 |     return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
      |            ~~~~~^

😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, will revert on next push.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too good to be true

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK ea993ff

Comment on lines 28 to 36
while (it != end) {
// Read a character from the incoming buffer and increment the iterator
auto c = static_cast<char>(*it);
++it;
++count;
// If the character we just consumed was \n, the line is terminated
if (c == '\n') break;
// If we are at the end of the incoming buffer, the line is terminated
if (it == end) break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ea993ff:

Seems you can remove this last line. You already do a while (it != end)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Youre right thanks, its gone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it is incremented inside the body of the loop, after the check in while (it != end). So after the increment it could be equal to end.

    const std::vector<std::byte> input{StringToBuffer("ab")};
    LineReader reader(input, /*max_read=*/1);
    std::optional<std::string> line{reader.ReadLine()};

Before this change the above snippet would have set line to ab and after the change it throws std::runtime_error: max_read exceeded by LineReader. I guess the new behavior is the correct? Maybe worth adding such a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vasild I added this as a test case, and am scratching my head how it slipped through. Removing the line is not only cleaner code, it is a bug fix. LineReader should not return a line longer than max_read

Comment on lines 33 to 34
// If the character we just consumed was \n, the line is terminated
if (c == '\n') break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit:
could make the termination character customizable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to leave this for now, next dev who needs a custom EOL character can add it in the future ;-)

pinheadmz and others added 4 commits January 12, 2026 15:31
The function logic is moved-only from net.cpp to netbase.cpp
and redeclared (as non-static) in netbase.h
Replace the C-style casting with C++ reinterpret_cast
https://httpwg.org/specs/rfc9110.html#rfc.section.5.1
Field names in HTTP headers are case-insensitive. These
structs will be used in the headers map to search by key.

In libevent field names are also converted to lowercase for comparison:
  evhttp_find_header()
  evutil_ascii_strcasecmp()
  EVUTIL_TOLOWER_()
HTTP 1.1 responses require a timestamp header with a format
specified (currently) by:
https://datatracker.ietf.org/doc/html/rfc9110#section-5.6.7

This specific format is defined in RFC1123:
https://www.rfc-editor.org/rfc/rfc1123#page-55

The libevent implementation can be referenced in evutil_time.c
evutil_date_rfc1123()
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task macOS-cross to arm64: https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105377
LLM reason (✨ experimental): Compilation failed: std::format is unavailable in the used standard library (missing C++20 std::format support).

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1d24816

@DrahtBot DrahtBot requested review from fjahr and furszy January 14, 2026 07:55
@pinheadmz
Copy link
Member Author

switching to draft for today: Im going to see if we can just use std::istringstream and if so, remove the LineReader commit

@pinheadmz pinheadmz marked this pull request as draft January 14, 2026 14:51
@pinheadmz
Copy link
Member Author

Ok did some learning about std::istringstream and I think the current implementation is better, and not reinventing a wheel from the standard library. Will convert the PR to ready for review.

Here's my reasoning (up for debate, @furszy ):

Raw data is copied as std::byte from the network socket into the receive buffer. LineReader is my boundary between the raw data and text-based HTTP stuff, it operates on a span of that data and returns valid, allocated strings. It only allocates if the line is <= max_read, and only once, after reading from the span.

std::istringstream can't operate on std::byte (only on strings) so that would either require one big copy in the beginning, or moving the "boundary" lower where we could read as char from the socket. Even if I made that change, std::istringstream::getline() appends one character at a time as it reads the input, expanding the output string. Only when getline() is done would it be possible to check the line against max_read.

@pinheadmz pinheadmz marked this pull request as ready for review January 14, 2026 15:38
@fjahr
Copy link
Contributor

fjahr commented Jan 15, 2026

re-ACK 1d24816

* @returns a string of the expected length.
* @throws a std::runtime_error if there is not enough data in the buffer.
*/
std::string ReadLength(size_t len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit:
Would just call this Read(size_t len).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna push back on this, I think in a class called "Line Reader" it's important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.

Comment on lines 290 to 293
/**
* Returns remaining size of bytes in buffer
*/
size_t Left() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nano nit:
Maybe Left() is not the best term. Could use Remaining()?. But at this point it doesn't matter much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure, taking this. 🙏

Comment on lines 20 to 48
std::optional<std::string> LineReader::ReadLine()
{
if (it == end) {
return std::nullopt;
}

auto line_start = it;
size_t count = 0;
while (it != end) {
// Read a character from the incoming buffer and increment the iterator
auto c = static_cast<char>(*it);
++it;
++count;
// If the character we just consumed was \n, the line is terminated.
// The \n itself does not count against max_read.
if (c == '\n') break;
// If the character we just consumed gives us a line length greater
// than max_read, and we are not at the end of the line (or buffer) yet,
// that means the line we are currently reading is too long, and we throw.
if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
}
const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);
const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n
return std::string(line);
}
Copy link
Member

@furszy furszy Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we advance the class member iterator it while scanning the span, throwing a max_read exception drops the bytes already read. We should advance it only at the very end to avoid loosing them (a test for this case would be good).

Also, small extra nit: what if there only char is a \n? The returned string will be empty?

As a separate topic, could use std::memchr to find the first instance of \n which would avoid the loop (haven't fully tried it but seems to be an option as well - dropping it in case you want to investigate it, all good if not).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch thanks. I now reset the iterator just before throwing. Tested by catching the "exceeded" error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.

Also added test cases for what makes the most sense to me when using the tool:

  • If the buffer is only "\n" then you get one empty "" line, nullopt after that
  • If the buffer is completely empty "" you get nullopt (not an error)

re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.

@DrahtBot DrahtBot requested a review from furszy January 15, 2026 15:21
This is a helper struct to parse HTTP messages from data in buffers
from sockets. HTTP messages begin with headers which are
CRLF-terminated lines (\n or \r\n) followed by an arbitrary amount of
body data. Whitespace is trimmed from the field lines but not the body.

https://httpwg.org/specs/rfc9110.html#rfc.section.5
Copy link
Member Author

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push to 9b3612b

Address feedback from @furszy about LineReader, fix an edge case after an error is thrown and add more tests.

Comment on lines 20 to 48
std::optional<std::string> LineReader::ReadLine()
{
if (it == end) {
return std::nullopt;
}

auto line_start = it;
size_t count = 0;
while (it != end) {
// Read a character from the incoming buffer and increment the iterator
auto c = static_cast<char>(*it);
++it;
++count;
// If the character we just consumed was \n, the line is terminated.
// The \n itself does not count against max_read.
if (c == '\n') break;
// If the character we just consumed gives us a line length greater
// than max_read, and we are not at the end of the line (or buffer) yet,
// that means the line we are currently reading is too long, and we throw.
if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
}
const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);
const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n
return std::string(line);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch thanks. I now reset the iterator just before throwing. Tested by catching the "exceeded" error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.

Also added test cases for what makes the most sense to me when using the tool:

  • If the buffer is only "\n" then you get one empty "" line, nullopt after that
  • If the buffer is completely empty "" you get nullopt (not an error)

re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.

* @returns a string of the expected length.
* @throws a std::runtime_error if there is not enough data in the buffer.
*/
std::string ReadLength(size_t len);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna push back on this, I think in a class called "Line Reader" it's important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.

Comment on lines 290 to 293
/**
* Returns remaining size of bytes in buffer
*/
size_t Left() const;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure, taking this. 🙏

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 9b3612b

const size_t max_read;
std::span<const std::byte>::iterator it;

explicit LineReader(std::span<const std::byte> buffer, size_t max_read);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
As we can read pass max_read limit if max_read + 1 is a jump line, I would rename the arg and member to max_output_length.

@DrahtBot DrahtBot requested a review from vasild January 15, 2026 19:58
@fjahr
Copy link
Contributor

fjahr commented Jan 15, 2026

re-ACK 9b3612b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants