feat: basic holepunch metrics by Frando · Pull Request #3748 · n0-computer/iroh

Frando · 2025-12-08T13:35:11Z

Description

This adds holepunching metrics, a few are because they are easy and give some insight. The main metrics are:

num_conns_opened: an existing metric counting the number of connections opened.
num_conns_direct: a new metric counting the number of connections which have ever been direct.

Together those do not give a direct count of the number of successful holepunches. But does give an accurate count of number of failed holepunches.

The holepunch_attempts metric is much more of an internal value that does not give direct information.

There are two more metrics:

paths_direct: number of paths that are direct
paths_relay: number of paths that are relayed

If paths_direct_total < paths_relay_total then you have some failed holepunching. The number will not be as accurate as using num_conns_opened_total and num_conns_direct_total however and is also more indicative of the total number of paths involved.

Fixes #3695

Breaking Changes

Some metrics are removed, but these have not been released.

Notes & open questions

One could also argue that you could count the "total" and "direct" remotes. This gives you a less-inflated number on how successful hoplepunching is. Though we really give an indication of the number of failed holepunches instead, so the over-inflation doesn't happen really.

The main reason I consider per-remote counting worse however is that we forget all state about a remote when a RemoteStateActor is stopped. This can then later come back and again result in wrong numbers.

Change checklist

Self-review.
Documentation updates following the style guide, if relevant.
Tests if relevant.
All breaking changes documented.
- List all breaking changes in the above "Breaking Changes" section.

github-actions · 2025-12-08T13:37:46Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3748/docs/iroh/

Last updated: 2026-01-08T15:51:41Z

flub · 2025-12-10T16:52:26Z

Help, I find this very hard to think about. What do we really want to measure here? I don't have a good answer to this so find commenting on the PR hard. @Arqu maybe has input on what we really need to have?

Things that are easy to measure:

number of times we start holepuching
number of times we explicitly open a path without holepunching (e.g. because it was holepunched in another connection).
number of times a path was opened for any reason (e.g. on the server side or because of holepunching or because of explicit open_path).
number of times we abandon a path.
number of times a path is abandoned.

Things that are hard(er) to measure:

number of times a path is opened from nat traversal (essentially shown in this PR currently though you also would have to be sure it wasn't from an explict open_path call which is not easy right now)
number of successful holepunches
number of failed holepunches

My inclination is to concentrate on the easy ones, but do you learn anything useful from those? You can't do any maths to tell you how good your holepunching is I think, so what's the point even? This goes back to, what are we trying to measure?

flub · 2025-12-10T14:58:20Z

iroh/src/magicsock/metrics.rs

-    /// The number of NAT traversal attempts initiated.
-    pub nat_traversal: Counter,
+    /// The number of remote endpoints for which of NAT traversal was initiated.
+    pub remote_holepunch_attempts: Counter,


The "remote_" prefix makes me think this was initiated by the remote.

Essentially the metrics you added are:

remotes connected to ("remotes_connection_established"?)

remotes holepunched ("remotes_connection_direct"?)

IIUC. This could be useful I guess. Not sure, it's also a bit weird (see separate comment).

flub · 2025-12-10T16:57:43Z

iroh/src/magicsock/remote_map/remote_state.rs

+                        && hp.remote_candidates.contains(ip_addr)
+                    {
+                        self.has_holepunched = true;
+                        self.metrics.remote_holepunch_success.inc();


One issue with this impl is that this metric would only be incremented on the client, and not the server. But that's a bit counter-intuitive from the user point of view.

flub · 2025-12-10T17:01:38Z

So I think the original intention was that you could have two metrics like:

no of connections that were on relay and had holepunch attempts
no of connections that moved from relay to direct after holepunching

And now you can do maths on them because you can compute the number of connections that did not holepunch and have a holepunching rate. But these metrics fall under the "hard" metrics 😞 I'm not sure how to do them.

Arqu · 2025-12-16T09:39:36Z

I dropped the ball a bit here. Think @flub is on the right path here.
I guess a number of things are here in play. Main question is what and for whom we're measuring things here. Iroh side metrics should generally serve the user and should be concerned with the performance of "their" node respectively.

That said I think it's fine that this is client centric and whether my endpoint managed to holepunch. Still might be slightly counter intuitive as it basically tells you more about the state of the other side network config. Ideally we would be aware of a hole punching success towards ourselves too.

The 2nd bit and more crucially, I don't think this should work to establish a network wide holepunch rate. You can fumble some maths and see some grouped averages or distributions between clients and their holepunch rates but nothing definitive. I intend to do a follow up PR that does a similar thing but from the Relay PoV which should help make more informed guesstimates on the network level. Still pretty hard though as we are operating with very little info.

For this PR I'd focus on getting the "easy" ones @flub listed as they give a lot of flexibility to do math and explore the state. However none of these counters I would limit to a 0-1 state but just let them increment naturally on what we deem the relevant event. It's easy enough to do the subtractions to get the actual state from it.

And count a few other fun things for fun.

github-actions · 2026-01-07T18:46:52Z

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: ce45168

Frando · 2026-01-08T09:29:06Z

LGTM, let's merge this! (Can't GitHub approve because it's my PR)

We can iterate further on this until 1.0 but this gives us a baseline to try things out, so let's get this in.

flub

(channelling @Frando 's LGTM)

Frando force-pushed the Frando/mp-metrics-hp branch from eff1c4b to acbc048 Compare December 8, 2025 13:35

n0bot bot added this to iroh Dec 8, 2025

github-project-automation bot moved this to 🏗 In progress in iroh Dec 8, 2025

Frando force-pushed the Frando/mp-metrics-hp branch from ca85a9f to f6c5e49 Compare December 9, 2025 11:33

Frando marked this pull request as ready for review December 9, 2025 13:57

Frando changed the title ~~draft: holepunch metrics~~ feat: holepunch metrics Dec 9, 2025

Frando changed the title ~~feat: holepunch metrics~~ feat: basic holepunch metrics Dec 9, 2025

Base automatically changed from Frando/mp-metrics-basics to feat-multipath December 9, 2025 21:38

Frando added 5 commits December 10, 2025 13:18

draft: holepunch metrics

bc64b28

chore: spelling

952fe36

refactor: clamp at 1 per remote

1d00820

chore: clippy

47ce844

chore: fmt

fed03ef

Frando force-pushed the Frando/mp-metrics-hp branch from 0a4ba70 to fed03ef Compare December 10, 2025 12:18

cleanup and add validation to test

f245aed

Frando requested a review from flub December 10, 2025 12:23

flub reviewed Dec 10, 2025

View reviewed changes

flub added this to the iroh: v1.0.0-rc.0 milestone Dec 22, 2025

flub modified the milestones: iroh: v1.0.0-rc.0, iroh: v0.96 Jan 7, 2026

flub self-assigned this Jan 7, 2026

flub added 3 commits January 7, 2026 19:35

Count number of connections that have been direct

302895c

And count a few other fun things for fun.

Merge branch 'main' into Frando/mp-metrics-hp

59932da

clippy

b17d763

flub changed the base branch from feat-multipath to main January 7, 2026 18:42

flub requested a review from Arqu January 7, 2026 19:08

flub moved this from 🏗 In progress to 👀 In review in iroh Jan 8, 2026

Merge branch 'main' into Frando/mp-metrics-hp

1f5d210

flub approved these changes Jan 8, 2026

View reviewed changes

Merge branch 'main' into Frando/mp-metrics-hp

2246d40

ramfox added this pull request to the merge queue Jan 8, 2026

Merged via the queue into main with commit 37bda14 Jan 8, 2026
28 of 29 checks passed

github-project-automation bot moved this from 👀 In review to ✅ Done in iroh Jan 8, 2026

flub deleted the Frando/mp-metrics-hp branch January 8, 2026 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: basic holepunch metrics#3748

feat: basic holepunch metrics#3748
ramfox merged 11 commits intomainfrom
Frando/mp-metrics-hp

Frando commented Dec 8, 2025 •

edited by flub

Loading

Uh oh!

github-actions bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

flub commented Dec 10, 2025 •

edited

Loading

Uh oh!

flub Dec 10, 2025

Uh oh!

flub Dec 10, 2025

Uh oh!

flub commented Dec 10, 2025 •

edited

Loading

Uh oh!

Arqu commented Dec 16, 2025

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Frando commented Jan 8, 2026

Uh oh!

flub left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Frando commented Dec 8, 2025 • edited by flub Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Breaking Changes

Notes & open questions

Change checklist

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flub commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flub Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

flub Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

flub commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Arqu commented Dec 16, 2025

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Frando commented Jan 8, 2026

Uh oh!

flub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Frando commented Dec 8, 2025 •

edited by flub

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading

flub commented Dec 10, 2025 •

edited

Loading

flub commented Dec 10, 2025 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading