Prepare string and net utils for future HTTP operations #34242

pinheadmz · 2026-01-09T18:11:27Z

This is a component of removing libevent as a dependency of the project. It is the first six commits of #32061 and provides a string-parsing utility (LineReader) that is also consumed by #34158.

These are the functions that are added / updated for HTTP and Torcontrol:

GetBindAddress(): Given a socket, provides the bound address as a CService. Currently used by p2p but moved from net to netbase so other modules can call it.
ToIntegral(): Already used to parse numbers from strings, added new argument base = 10 so it can also be used to parse hexadecimal integers. HTTP chunked transfer-encoding uses hex-encoded integers to specify payload size: https://datatracker.ietf.org/doc/html/rfc7230.html#section-4.1
AsciiCaseInsensitive comparators: Needed to store HTTP headers in an unordered_map. Headers are key-value pairs that are parsed with case-insensitive keys: https://httpwg.org/specs/rfc9110.html#rfc.section.5.1
FormatRFC1123DateTime(): The required datetime format for HTTP headers (e.g. Fri, 31 May 2024 19:18:04 GMT)
LineReader: Fields in HTTP requests are newline-terminated. This struct is given an input buffer and provides methods to read lines as strings.

DrahtBot · 2026-01-09T18:11:36Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34242.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	furszy, fjahr
Stale ACK	vasild

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34158 (torcontrol: Remove libevent usage by fjahr)
#32061 (Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)
#31672 (rpc: add cpu_load to getpeerinfo by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

furszy · 2026-01-09T18:22:59Z

src/netbase.cpp

+    CService addr_bind;
+    struct sockaddr_storage sockaddr_bind;
+    socklen_t sockaddr_bind_len = sizeof(sockaddr_bind);
+    if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) {


GetSockName() currently forwards the raw getsockname() return value (0 on success), which forces callers to use an inverted check (!GetSockName(...)).

Can we normalize it so it returns true on success?

bool Sock::GetSockName(sockaddr* name, socklen_t* name_len) const { return getsockname(m_socket, name, name_len) == 0; }

It would make the code slightly more readable and maintainable.

I agree with you, but this might be better addressed in a separate PR. The entire Sock class wraps POSIX socket management syscalls that return 0 on success (connect, bind, listen, close, etc...)

The idea behind the Sock methods is that they are mockable versions of the OS functions. Zero learning curve for newcomers (assuming familiarity with the OS interface). Sock::Foo(x, y, z) is equivalent to calling the OS's foo(x, y, z).

inverted check

the function returns an integer error code, not a boolean. A check like if (foo() == 0) looks better (less "inverted").

I see. Thanks @vasild. It would be nice to have more documentation on the return codes then. And +1 on the == 0 suggestion.

This line gets "modernized" in the next commit, including a comment and == 0, so I think it's cool for now.

"returns an integer error code" -- oh, I meant this:

https://www.man7.org/linux/man-pages/man2/getsockname.2.html#RETURN_VALUE

RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno
is set to indicate the error.

or the check could be if (foo() != -1).

fjahr · 2026-01-10T14:32:33Z

ACK ea993ff

I reviewed the same 6 commits as part of #32061 and all my feedback from there has been addressed.

vasild

ACK ea993ff

vasild · 2026-01-12T12:48:50Z

src/netbase.cpp

+    sockaddr_storage storage;
+    socklen_t len = sizeof(storage);
+
+    auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage));


The inner cast is unnecessary?

Suggested change

auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage));

auto sa = static_cast<sockaddr*>(&storage);

This works with a reinterpret_cast because sockaddr_storage and sockaddr are technically unrelated types even though they overlap... I'll clean it up, thanks. I think I had static_cast there before because of hints from Sonarcloud on corecheck.dev.

vasild · 2026-01-12T13:03:35Z

src/util/string.cpp

+        // that means the line we are currently reading is too long, and we throw.
+        if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
+    }
+    const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count);


Suggested change

const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count);

const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);

taken, thanks

furszy · 2026-01-12T18:32:06Z

src/util/time.cpp

+std::string FormatRFC1123DateTime(int64_t time)
+{
+    if (time < -62167219200 || 253402300799 < time) {
+        // RFC7231 mandates 4-digit year, so only support years 0 to 9999
+        return "";
+    }
+    const std::chrono::sys_seconds secs{std::chrono::seconds{time}};
+    const auto days{std::chrono::floor<std::chrono::days>(secs)};
+    const auto w{days.time_since_epoch().count() % 7}; // will be in the range [-6, 6]
+    std::string_view weekday{weekdays.at(w >= 0 ? w : w + 7)};
+    const std::chrono::year_month_day ymd{days};
+    std::string_view month{months.at(unsigned{ymd.month()} - 1)};
+    const std::chrono::hh_mm_ss hms{secs - days};
+    // examples: Mon, 27 Jul 2009 12:28:53 GMT
+    //           Fri, 31 May 2024 19:18:04 GMT
+    return strprintf("%03s, %02u %03s %04i %02i:%02i:%02i GMT", weekday, unsigned{ymd.day()}, month, signed{ymd.year()}, hms.hours().count(), hms.minutes().count(), hms.seconds().count());
+}


nit: I think C++20 has a short version of this (no extra arrays needed):

std::string FormatRFC1123DateTime(int64_t time) { if (time < -62167219200 || 253402300799 < time) { // RFC7231 mandates 4-digit year, so only support years 0 to 9999 return ""; } const std::chrono::sys_seconds secs{std::chrono::seconds{time}}; // Example: "Mon, 27 Jul 2009 12:28:53 GMT" return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs); }

maybe you already tried it and it doesn't work on all our supported platforms?

This should be locale-independent. I think it is:

https://en.cppreference.com/w/cpp/chrono/system_clock/formatter.html#Format_specification

the default "C" locale if L is not present in the format specification

https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105398?pr=34242#step:11:1480

/home/admin/actions-runner/_work/_temp/src/util/time.cpp:129:17: error: no member named 'format' in namespace 'std' 129 | return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs); | ~~~~~^

😢

Indeed, will revert on next push.

too good to be true

furszy

utACK ea993ff

furszy · 2026-01-12T19:06:58Z

src/util/string.cpp

+    while (it != end) {
+        // Read a character from the incoming buffer and increment the iterator
+        auto c = static_cast<char>(*it);
+        ++it;
+        ++count;
+        // If the character we just consumed was \n, the line is terminated
+        if (c == '\n') break;
+        // If we are at the end of the incoming buffer, the line is terminated
+        if (it == end) break;


In ea993ff:

Seems you can remove this last line. You already do a while (it != end)

Youre right thanks, its gone.

Note that it is incremented inside the body of the loop, after the check in while (it != end). So after the increment it could be equal to end.

const std::vector<std::byte> input{StringToBuffer("ab")}; LineReader reader(input, /*max_read=*/1); std::optional<std::string> line{reader.ReadLine()};

Before this change the above snippet would have set line to ab and after the change it throws std::runtime_error: max_read exceeded by LineReader. I guess the new behavior is the correct? Maybe worth adding such a test.

Thanks @vasild I added this as a test case, and am scratching my head how it slipped through. Removing the line is not only cleaner code, it is a bug fix. LineReader should not return a line longer than max_read

furszy · 2026-01-12T19:09:30Z

src/util/string.cpp

+        // If the character we just consumed was \n, the line is terminated
+        if (c == '\n') break;


tiny nit:
could make the termination character customizable.

Going to leave this for now, next dev who needs a custom EOL character can add it in the future ;-)

The function logic is moved-only from net.cpp to netbase.cpp and redeclared (as non-static) in netbase.h

Replace the C-style casting with C++ reinterpret_cast

https://httpwg.org/specs/rfc9110.html#rfc.section.5.1 Field names in HTTP headers are case-insensitive. These structs will be used in the headers map to search by key. In libevent field names are also converted to lowercase for comparison: evhttp_find_header() evutil_ascii_strcasecmp() EVUTIL_TOLOWER_()

HTTP 1.1 responses require a timestamp header with a format specified (currently) by: https://datatracker.ietf.org/doc/html/rfc9110#section-5.6.7 This specific format is defined in RFC1123: https://www.rfc-editor.org/rfc/rfc1123#page-55 The libevent implementation can be referenced in evutil_time.c evutil_date_rfc1123()

DrahtBot · 2026-01-13T18:22:51Z

🚧 At least one of the CI tasks failed.
_{Task macOS-cross to arm64: https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105377}
_{LLM reason (✨ experimental): Compilation failed: std::format is unavailable in the used standard library (missing C++20 std::format support).}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

src/test/util_string_tests.cpp

vasild

ACK 1d24816

pinheadmz · 2026-01-14T14:50:59Z

switching to draft for today: Im going to see if we can just use std::istringstream and if so, remove the LineReader commit

pinheadmz · 2026-01-14T15:38:55Z

Ok did some learning about std::istringstream and I think the current implementation is better, and not reinventing a wheel from the standard library. Will convert the PR to ready for review.

Here's my reasoning (up for debate, @furszy ):

Raw data is copied as std::byte from the network socket into the receive buffer. LineReader is my boundary between the raw data and text-based HTTP stuff, it operates on a span of that data and returns valid, allocated strings. It only allocates if the line is <= max_read, and only once, after reading from the span.

std::istringstream can't operate on std::byte (only on strings) so that would either require one big copy in the beginning, or moving the "boundary" lower where we could read as char from the socket. Even if I made that change, std::istringstream::getline() appends one character at a time as it reads the input, expanding the output string. Only when getline() is done would it be possible to check the line against max_read.

fjahr · 2026-01-15T14:03:29Z

re-ACK 1d24816

furszy · 2026-01-15T14:59:12Z

src/util/string.h

+     * @returns a string of the expected length.
+     * @throws a std::runtime_error if there is not enough data in the buffer.
+     */
+    std::string ReadLength(size_t len);


tiny nit:
Would just call this Read(size_t len).

I'm gonna push back on this, I think in a class called "Line Reader" it's important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.

furszy · 2026-01-15T15:00:01Z

src/util/string.h

+    /**
+     * Returns remaining size of bytes in buffer
+     */
+    size_t Left() const;


nano nit:
Maybe Left() is not the best term. Could use Remaining()?. But at this point it doesn't matter much.

Ok sure, taking this. 🙏

furszy · 2026-01-15T15:21:39Z

src/util/string.cpp

+std::optional<std::string> LineReader::ReadLine()
+{
+    if (it == end) {
+        return std::nullopt;
+    }
+
+    auto line_start = it;
+    size_t count = 0;
+    while (it != end) {
+        // Read a character from the incoming buffer and increment the iterator
+        auto c = static_cast<char>(*it);
+        ++it;
+        ++count;
+        // If the character we just consumed was \n, the line is terminated.
+        // The \n itself does not count against max_read.
+        if (c == '\n') break;
+        // If the character we just consumed gives us a line length greater
+        // than max_read, and we are not at the end of the line (or buffer) yet,
+        // that means the line we are currently reading is too long, and we throw.
+        if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
+    }
+    const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);
+    const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n
+    return std::string(line);
+}


Since we advance the class member iterator it while scanning the span, throwing a max_read exception drops the bytes already read. We should advance it only at the very end to avoid loosing them (a test for this case would be good).

Also, small extra nit: what if there only char is a \n? The returned string will be empty?

As a separate topic, could use std::memchr to find the first instance of \n which would avoid the loop (haven't fully tried it but seems to be an option as well - dropping it in case you want to investigate it, all good if not).

Great catch thanks. I now reset the iterator just before throwing. Tested by catching the "exceeded" error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.

Also added test cases for what makes the most sense to me when using the tool:

If the buffer is only "\n" then you get one empty "" line, nullopt after that

If the buffer is completely empty "" you get nullopt (not an error)

re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.

This is a helper struct to parse HTTP messages from data in buffers from sockets. HTTP messages begin with headers which are CRLF-terminated lines (\n or \r\n) followed by an arbitrary amount of body data. Whitespace is trimmed from the field lines but not the body. https://httpwg.org/specs/rfc9110.html#rfc.section.5

pinheadmz

push to 9b3612b

Address feedback from @furszy about LineReader, fix an edge case after an error is thrown and add more tests.

pinheadmz · 2026-01-15T18:37:23Z

src/util/string.cpp

+std::optional<std::string> LineReader::ReadLine()
+{
+    if (it == end) {
+        return std::nullopt;
+    }
+
+    auto line_start = it;
+    size_t count = 0;
+    while (it != end) {
+        // Read a character from the incoming buffer and increment the iterator
+        auto c = static_cast<char>(*it);
+        ++it;
+        ++count;
+        // If the character we just consumed was \n, the line is terminated.
+        // The \n itself does not count against max_read.
+        if (c == '\n') break;
+        // If the character we just consumed gives us a line length greater
+        // than max_read, and we are not at the end of the line (or buffer) yet,
+        // that means the line we are currently reading is too long, and we throw.
+        if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
+    }
+    const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);
+    const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n
+    return std::string(line);
+}


Great catch thanks. I now reset the iterator just before throwing. Tested by catching the "exceeded" error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.

Also added test cases for what makes the most sense to me when using the tool:

If the buffer is only "\n" then you get one empty "" line, nullopt after that

If the buffer is completely empty "" you get nullopt (not an error)

re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.

pinheadmz · 2026-01-15T18:39:03Z

src/util/string.h

+     * @returns a string of the expected length.
+     * @throws a std::runtime_error if there is not enough data in the buffer.
+     */
+    std::string ReadLength(size_t len);


I'm gonna push back on this, I think in a class called "Line Reader" it's important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.

pinheadmz · 2026-01-15T18:41:08Z

src/util/string.h

+    /**
+     * Returns remaining size of bytes in buffer
+     */
+    size_t Left() const;


Ok sure, taking this. 🙏

furszy

utACK 9b3612b

furszy · 2026-01-15T19:57:21Z

src/util/string.h

+    const size_t max_read;
+    std::span<const std::byte>::iterator it;
+
+    explicit LineReader(std::span<const std::byte> buffer, size_t max_read);


nit:
As we can read pass max_read limit if max_read + 1 is a jump line, I would rename the arg and member to max_output_length.

fjahr · 2026-01-15T21:41:53Z

re-ACK 9b3612b

This was referenced Jan 9, 2026

Replace libevent with our own HTTP and socket-handling implementation #32061

Open

Remove libevent as a dependency (HTTP / cli / torcontrol) #31194

Open

furszy reviewed Jan 9, 2026

View reviewed changes

This was referenced Jan 10, 2026

torcontrol: Remove libevent usage #34158

Open

rpc: add cpu_load to getpeerinfo #31672

Open

vasild approved these changes Jan 12, 2026

View reviewed changes

furszy reviewed Jan 12, 2026

View reviewed changes

pinheadmz and others added 4 commits January 12, 2026 15:31

Make GetBindAddress() callable from outside net.cpp

a0ca851

The function logic is moved-only from net.cpp to netbase.cpp and redeclared (as non-static) in netbase.h

Modernize GetBindAddress()

e93778d

Replace the C-style casting with C++ reinterpret_cast

string: add base argument for ToIntegral to operate on hexadecimal

0717e88

pinheadmz force-pushed the http-utils branch from ea993ff to d35b582 Compare January 13, 2026 16:01

DrahtBot added the CI failed label Jan 13, 2026

pinheadmz force-pushed the http-utils branch from d35b582 to 1d24816 Compare January 13, 2026 18:42

furszy reviewed Jan 13, 2026

View reviewed changes

src/test/util_string_tests.cpp Show resolved Hide resolved

DrahtBot removed the CI failed label Jan 13, 2026

vasild approved these changes Jan 14, 2026

View reviewed changes

DrahtBot requested review from fjahr and furszy January 14, 2026 07:55

pinheadmz marked this pull request as draft January 14, 2026 14:51

pinheadmz marked this pull request as ready for review January 14, 2026 15:38

furszy reviewed Jan 15, 2026

View reviewed changes

DrahtBot requested a review from furszy January 15, 2026 15:21

pinheadmz force-pushed the http-utils branch from 1d24816 to 9b3612b Compare January 15, 2026 19:11

pinheadmz commented Jan 15, 2026

View reviewed changes

furszy reviewed Jan 15, 2026

View reviewed changes

DrahtBot requested a review from vasild January 15, 2026 19:58

	auto sa = static_cast<sockaddr>(static_cast<void>(&storage));
	auto sa = static_cast<sockaddr*>(&storage);

	const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count);
	const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);

		// If the character we just consumed was \n, the line is terminated
		if (c == '\n') break;

Prepare string and net utils for future HTTP operations #34242

Are you sure you want to change the base?

Prepare string and net utils for future HTTP operations #34242

Conversation

pinheadmz commented Jan 9, 2026

Uh oh!

DrahtBot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjahr commented Jan 10, 2026

Uh oh!

vasild left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

furszy Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

furszy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Jan 13, 2026

Uh oh!

Uh oh!

vasild left a comment

Choose a reason for hiding this comment

Uh oh!

pinheadmz commented Jan 14, 2026

Uh oh!

pinheadmz commented Jan 14, 2026

Uh oh!

fjahr commented Jan 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Jan 9, 2026 •

edited

Loading

furszy Jan 12, 2026 •

edited

Loading

furszy Jan 15, 2026 •

edited

Loading