Skip to content

Conversation

@0xB10C
Copy link
Contributor

@0xB10C 0xB10C commented Aug 12, 2022

This adds five new tracepoints with documentation and tests for network connections:

  • established connections with net:inbound_connection and net:outbound_connection
  • closed connections (both closed by us or by the peer) with net:closed_connnection
  • inbound connections that we choose to evict with net:evicted_inbound_connection
  • connections that are misbehaving and punished with net:misbehaving_connection

I've been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (comment).

I've been back and forth on which arguments to include. For example, net:evicted_connection could also include some of the eviction metrics like e.g. last_block_time, min_ping_time, ... but I've left them out for now. If wanted, this can be added here or in a follow-up. I've tried to minimize a potential performance impact by measuring executed instructions with gdb where possible (method described here). I don't think a few hundred extra instructions are too crucial, as connection opens/closes aren't too frequent (compared to e.g. P2P messages). Note: e.g. CreateNodeFromAcceptedSocket() usually executes between 80k and 90k instructions for each new inbound connection.

tracepoint instructions
net:inbound_connection 390 ins
net:outbound_connection between 700 and 1000 ins
net:closed_connnection 473 ins
net:evicted_inbound_connection not measured; likely similar to net:closed_connnection
net:misbehaving_connection not measured

Also added a bpftrace (tested with v0.14.1) log_p2p_connections.bt example script that produces output similar to:

Attaching 6 probes...
Logging opened, closed, misbehaving, and evicted P2P connections
OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
MISBEHAVING conn id=1, message='getdata message size = 50001'
CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312

@jb55
Copy link
Contributor

jb55 commented Aug 12, 2022 via email

@portlandhodl
Copy link
Contributor

portlandhodl commented Aug 18, 2022

Just reporting in that everything is functional. Console log looks correct.

Tests Passed
image

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 13, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/25832.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK laanwj, vasild, sipa
Concept ACK dergoegge, i-am-yuvi
Stale ACK jb55, virtu, maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30988 (Split CConnman by vasild)
  • #30381 ([WIP] net: return result from addnode RPC by willcl-ark)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@0xB10C
Copy link
Contributor Author

0xB10C commented Nov 16, 2022

Note to self: might be worth checking if we can also pass AS information and not only the net_group.

@0xB10C
Copy link
Contributor Author

0xB10C commented Jan 20, 2023

Note to self: might be worth checking if we can also pass AS information and not only the net_group.

This is possible by accessing the CNodeStats, but don't think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary.


Fixed up the linter complains and rebased. This is ready for further review.

@dergoegge
Copy link
Member

Concept ACK

@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from 8528781 to ac438b1 Compare March 21, 2023 09:16
@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from ac438b1 to 6de4fc7 Compare March 21, 2023 11:36
@0xB10C
Copy link
Contributor Author

0xB10C commented Mar 21, 2023

The linter started complaining about "E275 missing whitespace after keyword". Fixed this up in the individual commits.

ac438b1 -> 6de4fc7 (2022-05-connection-tracepoints.pr.1 -> 2022-05-connection-tracepoints.pr.2; compare)

@virtu
Copy link
Contributor

virtu commented Mar 24, 2023

ACK 6de4fc7

Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script.

Some points:

  • Any reason why only inbound_connection and outbound_connection explicitly cast all arguments in the demo script?
  • I think some of the format specifiers in the demo script are lightly off, leading to outputs with negative netgroup values on my system. I believe it should be %lld for int64 and %llu for uint64.
  • As brought up previously, it would be nice to switch from nKeyedNetGroup to netgroup ids that are consistent over time

@DrahtBot DrahtBot requested a review from jb55 March 24, 2023 13:44
@fanquake fanquake merged commit a43f08c into bitcoin:master Feb 5, 2025
18 checks passed
@0xB10C 0xB10C deleted the 2022-05-connection-tracepoints branch February 5, 2025 16:42
@0xB10C
Copy link
Contributor Author

0xB10C commented Feb 5, 2025

Using BIP155 network IDs is a good idea! Planning to address #25832 (comment) in a followup.

@vasild
Copy link
Contributor

vasild commented Feb 6, 2025

CNetAddr::GetBIP155Network() might be useful.

@0xB10C
Copy link
Contributor Author

0xB10C commented Feb 7, 2025

I implemented both the BIP155 NetworkID approach (0xB10C@72133f3) and the Network as string approach from #25832 (comment) (0xB10C@39102d4).

The BIP155 approach is a bit more involved with making GetBIP155Network() public, introducing a ConnectedThroughBIP155Network() helper, and passing 0 (or something else?) for NET_INTERNAL as there is no BIP155 equivalent. With this approach, NET_UNROUTABLE isn't exposed - e.g. 127.0.0.1 will get the BIP155 NetworkID for IPv4.

The network name as string approach exposes NET_UNROUTABLE as not_publicly_routable and NET_INTERNAL as internal. We have these in -netinfo too.

I have a slight preference on BIP155 NetworkIDs over keeping-as-is or network name as string.

@laanwj
Copy link
Member

laanwj commented Feb 9, 2025

I have a slight preference on BIP155 NetworkIDs over keeping-as-is or network name as string.

Agree. Strings are sensitive to typos, and require extra effort for matching when doing statistics. i think even the current enumeration is better than that. BIP155 is a good suggestion, but as you have to make up non-standard values, and handle NET_UNROUTABLE differently i'm not entirely sure.

The purpose of tracing is to gain insight into internal state so logging the actual values makes more sense than thinking up a new convention. The internal enumeration is likely stable enough to warrant being a (semi-)stable interface.

@sipa
Copy link
Member

sipa commented Feb 9, 2025

The BIP155 approach is a bit more involved with making GetBIP155Network() public, introducing a ConnectedThroughBIP155Network() helper, and passing 0 (or something else?) for NET_INTERNAL as there is no BIP155 equivalent.

Given that BIP155 network IDs are uint8_t's, I think it suffices to pick a value outside of that range for unreachable. For example, 0x100, or 0xffffffff for a uint32_t, or -1 for a signed integer type.

Using 0 works too of course, and I guess it won't get assigned in future extensions, but using something outside of the range might be even more future-proof.

@0xB10C
Copy link
Contributor Author

0xB10C commented Feb 12, 2025

Agree. Strings are sensitive to typos, and require extra effort for matching when doing statistics. i think even the current enumeration is better than that. BIP155 is a good suggestion, but as you have to make up non-standard values, and handle NET_UNROUTABLE differently i'm not entirely sure.
The purpose of tracing is to gain insight into internal state so logging the actual values makes more sense than thinking up a new convention. The internal enumeration is likely stable enough to warrant being a (semi-)stable interface.

I tend to agree. A few toughs:

Unless there are strong opinions on needing a follow up, I'll plan to leave this as is.

for closed_connection in closed_connections:
assert closed_connection.conn.id > 0
assert_equal("inbound", closed_connection.conn.conn_type.decode('utf-8'))
assert_equal(0, closed_connection.conn.network)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been

- assert_equal(0, closed_connection.conn.network)
+ assert_equal(NETWORK_TYPE_UNROUTABLE, closed_connection.conn.network)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.