Skip to content

Conversation

@dergoegge
Copy link
Member

@dergoegge dergoegge commented Jul 9, 2024

This adds a fuzzing harness dedicated to the version handshake. To avoid determinism issues, the harness creates necessary components each iteration (addrman, peerman, etc). A harness like this would have easily caught https://bitcoincore.org/en/2024/07/03/disclose-timestamp-overflow/.

As a performance optimization, this PR includes a change to PeerManager to lazily initialize various filters (to avoid large unnecessary memory allocations each iteration).

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 9, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK marcofleon, brunoerg, glozow, mzumsande
Concept ACK maflcko, kevkevinpal, morehouse, edilmedeiros

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30110 (refactor: TxDownloadManager by glozow)
  • #28031 (Package Relay 1/3: Introduce TxDownloadManager and improve orphan-handling by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the Tests label Jul 9, 2024
@maflcko
Copy link
Member

maflcko commented Jul 9, 2024

Concept ACK, but would be good to explain how this is different from the existing approach, introduced in "fuzz: version handshake" #20370, which was a replacement for "fuzz: Version handshake" #20097 .

Now that the timedata bug is fixed, and the module and related globals removed completely, it is a bit hard(er) to re-introduce it for testing, I guess. IIRC I did check it at some point and found that the existing fuzz targets would act as a regression test.

@dergoegge
Copy link
Member Author

would be good to explain how this is different from the existing approach, introduced in "fuzz: version handshake" #20370, which was a replacement for "fuzz: Version handshake" #20097 .

The main difference is that this harness focuses on the version handshake, where a modification to the existing process_messages harness would further widen the space of reachable code. It could make sense to have both but I would expect the focused harness to be better/faster at finding bugs.

IIRC I did check it at some point and found that the existing fuzz targets would act as a regression test.

Happy to take that out of the op, the primary goal is to have fuzzing coverage for the version handshake (non-existent currently), which would be able to find such bugs.

@maflcko
Copy link
Member

maflcko commented Jul 9, 2024

Happy to take that out of the op, the primary goal is to have fuzzing coverage for the version handshake (non-existent currently), which would be able to find such bugs.

Are you sure it is completely non-existent? I am sure I did fuzz the version handshake at some point in time and the current coverage report (https://maflcko.github.io/b-c-cov/fuzz.coverage/src/net_processing.cpp.gcov.html) seems to agree. Maybe I am missing something.

No objection in any case. I am just trying to understand better what exactly this improves. Maybe you can show a coverage report that shows higher coverage compared to before or how a faked introduced bug is found faster compared to before.

Concept ACK in any case.

@dergoegge
Copy link
Member Author

Are you sure it is completely non-existent?

The coverage we do have comes from the process_messages harness, but the harness doesn't fuzz the handshake it only makes sure that the handshake completes, so that the other messages can be fuzzed.

I'll generate a coverage report to show the difference

@brunoerg
Copy link
Contributor

Concept ACK

@kevkevinpal
Copy link
Contributor

The coverage we do have comes from the process_messages harness, but the harness doesn't fuzz the handshake it only makes sure that the handshake completes, so that the other messages can be fuzzed.

Concept ACK 75fc22f
I think it makes sense to fuzz the handshake itself

Maybe I'm just not too familiar with fuzz testing but this seems very similar to process_messages.
Would it make sense to modify process_messages to also test the handshake instead of creating another fuzz target?

@maflcko
Copy link
Member

maflcko commented Jul 12, 2024

Are you sure it is completely non-existent?

The coverage we do have comes from the process_messages harness, but the harness doesn't fuzz the handshake it only makes sure that the handshake completes, so that the other messages can be fuzzed.

Sorry for the wrong comments above. Not sure why I hallucinated that this was already fuzzed. I guess I remembered seeing the timedata warnings printed in the fuzz runner and somehow assumed this part of the code is already fully fuzzed.

However, I guess the timedata fuzz target explicitly avoided the abs64 UB call (

// Divide by 2 to avoid signed integer overflow in .median()
). And the process_messages one as well:
int64_t{}, // dummy time

Copy link

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@dergoegge dergoegge force-pushed the 2024-07-fuzz-handshake branch 2 times, most recently from d94db4f to 2a017fd Compare July 29, 2024 08:56
@dergoegge
Copy link
Member Author

@maflcko @brunoerg review ping :)

@dergoegge dergoegge force-pushed the 2024-07-fuzz-handshake branch from 2a017fd to c670fe2 Compare July 30, 2024 12:37
@dergoegge
Copy link
Member Author

Addressed #30413 (comment).

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not reviewing this earlier.

I think the title seems to imply that this is a test-only change, but you are also lazy-initializing some fields in the peer manager. What about changing the title to p2p: Lazy init some bloom filters; fuzz version handshake.

Left some nits. Feel free to ignore.

@dergoegge dergoegge changed the title fuzz: Version handshake p2p: Lazy init some bloom filters; fuzz version handshake Jul 30, 2024
@dergoegge dergoegge force-pushed the 2024-07-fuzz-handshake branch from c670fe2 to f166694 Compare July 31, 2024 12:12
@dergoegge
Copy link
Member Author

@maflcko Took all your nits.

-BEGIN VERIFY SCRIPT-
sed -i 's/m_recent_confirmed_transactions/m_lazy_recent_confirmed_transactions/g' $(git grep -l 'm_recent_confirmed_transactions')
sed -i 's/m_recent_rejects_reconsiderable/m_lazy_recent_rejects_reconsiderable/g' $(git grep -l 'm_recent_rejects_reconsiderable')
sed -i 's/m_recent_rejects/m_lazy_recent_rejects/g' $(git grep -l 'm_recent_rejects')
-END VERIFY SCRIPT-
@dergoegge dergoegge force-pushed the 2024-07-fuzz-handshake branch from f166694 to afd237b Compare July 31, 2024 12:27
Copy link
Contributor

@edilmedeiros edilmedeiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

I'm curious why the filters should be made lazy for this to work.

@dergoegge
Copy link
Member Author

I'm curious why the filters should be made lazy for this to work.

It's not required but it makes the harness more performant, i.e. we can run more testcases each second (see #30413 (comment)).

Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK afd237b. I compared the coverage of net_processing from this harness to the process_message and process_messages harnesses to see the differences. This target hits more specific parts of the version handshake. The stability looks good as well, at about 94%.

@edilmedeiros
Copy link
Contributor

edilmedeiros commented Jul 31, 2024

I'm new to this kind of test, so just to be sure this is not a specific quirk of my setup (Apple M3 Pro Sonoma 14.5 compiled with clang-18 installed through macports).

I'm getting some runtime errors at the beginning of the test. After that it seems to run smoothly.

Compiled with ./configure --enable-fuzz --with-sanitizers=fuzzer,address,undefined CC=clang-mp-18 CXX=clang++-mp-18 --with-boost=/opt/local/libexec/boost/1.78.

Running with FUZZ=p2p_handshake src/test/fuzz/fuzz

❯ FUZZ=p2p_handshake src/test/fuzz/fuzz
fuzz(21234,0x201680c00) malloc: nano zone abandoned due to inability to reserve vm space.
/opt/local/libexec/llvm-18/bin/../include/c++/v1/variant:495:12: runtime error: call to function decltype(auto) std::__1::__variant_detail::__visitation::__base::__dispatcher<0ul, 0ul>::__dispatch[abi:ne180100]<void std::__1::__variant_detail::__ctor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>>::__generic_construct[abi:ne180100]<std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>>(std::__1::__variant_detail::__ctor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>>&, std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>&&)::'lambda'(std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>&, auto&&)&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&&>(std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&&) through pointer to incorrect function type 'void (*)((lambda at /opt/local/libexec/llvm-18/bin/../include/c++/v1/variant:814:11) &&, std::__variant_detail::__base<std::__variant_detail::_Trait::_Available, RPCArg::Optional, std::string, UniValue> &, std::__variant_detail::__base<std::__variant_detail::_Trait::_Available, RPCArg::Optional, std::string, UniValue> &&)'
variant:532: note: decltype(auto) std::__1::__variant_detail::__visitation::__base::__dispatcher<0ul, 0ul>::__dispatch[abi:ne180100]<void std::__1::__variant_detail::__ctor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>>::__generic_construct[abi:ne180100]<std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>>(std::__1::__variant_detail::__ctor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>>&, std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>&&)::'lambda'(std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>&, auto&&)&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&&>(std::__1::__variant_detail::__move_constructor<std::__1::__variant_detail::__traits<RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>, (std::__1::__variant_detail::_Trait)1>, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)1, RPCArg::Optional, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, UniValue>&&) defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /opt/local/libexec/llvm-18/bin/../include/c++/v1/variant:495:12
rpc/server.h:106:15: runtime error: call to function getblockchaininfo() through pointer to incorrect function type 'RPCHelpMan (*)()'
blockchain.cpp:1269: note: getblockchaininfo() defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior rpc/server.h:106:15
rpc/server.h:108:15: runtime error: call to function getblockchaininfo() through pointer to incorrect function type 'RPCHelpMan (*)()'
blockchain.cpp:1269: note: getblockchaininfo() defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior rpc/server.h:108:15
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 277630637
INFO: Loaded 1 modules   (1220045 inline 8-bit counters): 1220045 [0x103dc9e10, 0x103ef3bdd),
INFO: Loaded 1 PC tables (1220045 PCs): 1220045 [0x103ef3be0,0x1051918b0),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 1784 ft: 1784 corp: 1/1b exec/s: 0 rss: 131Mb
#4	NEW    cov: 1787 ft: 1787 corp: 2/3b lim: 4 exec/s: 0 rss: 132Mb L: 2/2 MS: 2 CopyPart-CrossOver-
#5	NEW    cov: 1787 ft: 1790 corp: 3/7b lim: 4 exec/s: 0 rss: 133Mb L: 4/4 MS: 1 CopyPart-
#8	NEW    cov: 1787 ft: 1793 corp: 4/10b lim: 4 exec/s: 0 rss: 134Mb L: 3/4 MS: 3 CopyPart-ChangeByte-CrossOver-
#9	NEW    cov: 1798 ft: 2467 corp: 5/13b lim: 4 exec/s: 0 rss: 134Mb L: 3/4 MS: 1 ChangeByte-
#19	NEW    cov: 1798 ft: 3487 corp: 6/16b lim: 4 exec/s: 0 rss: 139Mb L: 3/4 MS: 5 EraseBytes-ChangeByte-ShuffleBytes-EraseBytes-InsertByte-
#117	REDUCE cov: 1798 ft: 3487 corp: 6/14b lim: 4 exec/s: 0 rss: 179Mb L: 1/4 MS: 3 CrossOver-CrossOver-EraseBytes-
#127	REDUCE cov: 1798 ft: 3487 corp: 6/13b lim: 4 exec/s: 0 rss: 184Mb L: 2/4 MS: 5 ChangeBit-CrossOver-EraseBytes-ShuffleBytes-ChangeBit-
#339	NEW    cov: 1798 ft: 3490 corp: 7/18b lim: 6 exec/s: 0 rss: 272Mb L: 5/5 MS: 2 CopyPart-InsertByte-
#420	REDUCE cov: 1798 ft: 3490 corp: 7/17b lim: 6 exec/s: 0 rss: 305Mb L: 1/5 MS: 1 EraseBytes-
#961	NEW    cov: 1801 ft: 3495 corp: 8/27b lim: 11 exec/s: 0 rss: 453Mb L: 10/10 MS: 1 InsertRepeatedBytes-
...

@brunoerg
Copy link
Contributor

ACK afd237b

Left it running overnight (slight changes since it), coverage and performance looks great.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK afd237b lazy blooms look ok

@@ -0,0 +1,107 @@
// Copyright (c) 2020-present The Bitcoin Core developers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2020-present The Bitcoin Core developers
// Copyright (c) 2024-present The Bitcoin Core developers

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK afd237b

The lazy initialization seem like a good thing in itself because it might also save a little bit of memory during IBD when we don't need the filters yet - m_lazy_recent_rejects and m_lazy_recent_rejects_reconsiderable should only be initialized after getting out of IBD.

Maybe, as a follow-up, it could make sense to skip early when in IBD in

bitcoin/src/net_processing.cpp

Lines 4227 to 4233 in 66e82dc

const bool fAlreadyHave = AlreadyHaveTx(gtxid, /*include_reconsiderable=*/true);
LogPrint(BCLog::NET, "got inv: %s %s peer=%d\n", inv.ToString(), fAlreadyHave ? "have" : "new", pfrom.GetId());
AddKnownTx(*peer, inv.hash);
if (!fAlreadyHave && !m_chainman.IsInitialBlockDownload()) {
AddTxAnnouncement(pfrom, gtxid, current_time);
}
so that a stray tx-inv received during IBD doesn't result in the filter being initialized in AlreadyHaveTx (when we wouldn't call AddTxAnnouncement() for it anyway)

@dergoegge
Copy link
Member Author

@edilmedeiros I've seen people have similar issues on a Mac, I don't think these are real issues but rather false positives.

You can configure without the address and/or undefined sanitizer and that should get rid of the errors, but this obviously is not a great solution if you want to actually use those sanitizers. A second option could be to try to use our ubsan suppressions (https://github.com/bitcoin/bitcoin/blob/master/test/sanitizer_suppressions/ubsan) and add the errors you are seeing to it. See the fuzz test runner for how to tell ubsan to use the suppressions file:

'UBSAN_OPTIONS':

@hebasto
Copy link
Member

hebasto commented Aug 5, 2024

Ported to the CMake-based build system in hebasto#304.

@bitcoin bitcoin locked and limited conversation to collaborators Aug 5, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.