Skip to content

feat(iroh)!: Retain stats for closed and abandoned paths in the path watcher#3899

Merged
Frando merged 34 commits intomainfrom
Frando/pathwatch
Feb 20, 2026
Merged

feat(iroh)!: Retain stats for closed and abandoned paths in the path watcher#3899
Frando merged 34 commits intomainfrom
Frando/pathwatch

Conversation

@Frando
Copy link
Copy Markdown
Member

@Frando Frando commented Jan 29, 2026

Description

Improves the path watcher for connections to reliably allow accessing the stats of abandoned paths.

  • Abandoned paths are not removed from the value of the watchable for a connection's paths, as long as there are active watchers (subscribers)
  • Only once all watchers are dropped, we drop the abandoned paths from the watchable's value on the next update
  • This means that if you keep a PathWatcher alive for the duration of a connection, it will contain all paths the connection ever used.
  • We use the changes from feat: Retain final path stats if a Path is alive, add WeakPathHandle noq#386 and keep a WeakPathHandle for all paths in the watchable's value. This prevents the path stats to be dropped within a quinn connection even if the path is abandoned.
  • We do no longer store path stats anywhere in iroh directly, but only ever access them from quinn. Due to keeping the WeakPathHandle, we can ensure that the stats are always available as long as the connection hasn't been dropped.

The combination of all this gives us a reliable way to access final path stats for all paths used in a connection, as long as you keep a reference to the connection around, which is quite straightforward to do and document.

Breaking Changes

  • Connection::paths and ConnectionInfo::paths now return a PathWatcher (which still implements n0_watcher::Watcher but now also is a named struct)
  • PathInfo::stats and PathInfo::rtt now return Option. They return None if the underlying connection has been droped.

Notes & open questions

I spend quite some time going back-and-forth over different approaches. Very open to other ideas, but I'm a bit out of further ideas currently and this is the best I could come up with so far.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@Frando Frando force-pushed the Frando/pathwatch branch 2 times, most recently from 6bf73fc to ed10e46 Compare January 29, 2026 14:51
@n0bot n0bot bot added this to iroh Jan 29, 2026
@github-project-automation github-project-automation bot moved this to 🚑 Needs Triage in iroh Jan 29, 2026
@Frando Frando marked this pull request as draft January 29, 2026 15:43
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 30, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3899/docs/iroh/

Last updated: 2026-02-20T14:13:44Z

@Frando Frando changed the title wip: path watcher feat(iroh): Allow to access stats for closed and abandoned paths in the path watcher Feb 3, 2026
@Frando Frando changed the title feat(iroh): Allow to access stats for closed and abandoned paths in the path watcher feat(iroh): Retain stats for closed and abandoned paths in the path watcher Feb 3, 2026
@Frando Frando marked this pull request as ready for review February 3, 2026 10:04
@dignifiedquire dignifiedquire moved this from 🚑 Needs Triage to 🏗 In progress in iroh Feb 3, 2026
Copy link
Copy Markdown
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM (despite my sadness is discord), only the path state naming needs to be sorted. I know not everywhere uses the right terminology yet, e.g. the RemoteStateActor does also use some somewhat lose terms. We'll update things bit by bit to get to the consistent naming.

}
}

impl Iterator for PathInfoListIntoIter {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I PathInfoListIntoIter is the right name, this is the iterator itself is it not? PathInfoListIter would make more sense to me. It seems SmallVec uses the same naming though, so maybe I'm just wrong.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess the IntoIter naming in e.g. SmallVec comes about when you have both an owned iterator (returned from into_iter) and a non-owned iterator (returned from _iter). The latter we also have but as unnames impl Iterator type. So for us there's no need to have the convoluted name, agreed. Renamed to PathInfoListIter.

Comment on lines +184 to +185
is_closed: bool,
is_abandoned: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these values mean? is_abandoned is never even read and only set?

FWIW the great path state bikeshed resulted in having states:

  • abandoned
  • discarded

Going from any prior state to abandoned is "closing" (a verb), a rather loose term.

Looking at where you call set_closed and set_abandoned, I think that:

  • is_abandoned -> is_discarded
  • is_closed -> is_abandoned

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this again, and I think a single flag is_abandoned is enough. The difference whether a path is discarded or not does not matter. It was a leftover, I think, from when the discarded state meant we would have to track stats differently, but this is no longer the case now that a WeakPathHandle keeps the path stats available in the quinn Connection.
So with the latest commit, there's now only a single is_abandoned state being tracked. Nice.

}

/// Returns `true` if this path is closed.
pub fn is_closed(&self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This public method name I find reasonable, despite my complaints about the internal naming. For someone without any multipath knowledge this can be helpful and the distinction between abandoned and discarded is not much help to the user.

I would however add a longer description saying that this really returns true as soon as a path has been abandoned by the local endpoint, and remains true forever after. So that folks who do know more about multipath can make sense of it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. Thought about renaming to is_abandoned but maybe for the public API is_closed is better indeed, left it for now.


/// Updates the watchable's value through a closure.
///
/// After the update is performed, and if there are currently no watchers, data for abandoned paths
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is where is_abandoned is read. This reaffirms to me that is_discarded is the better name here.

@matheus23 matheus23 removed their request for review February 12, 2026 10:10
@Frando Frando requested a review from flub February 12, 2026 10:37
@Frando Frando changed the title feat(iroh): Retain stats for closed and abandoned paths in the path watcher feat(iroh)!: Retain stats for closed and abandoned paths in the path watcher Feb 16, 2026
Comment on lines -1491 to -1503
// Verify that the path watch streams close.
assert_eq!(
tokio::time::timeout(Duration::from_secs(1), paths_client.next())
.await
.unwrap(),
None
);
assert_eq!(
tokio::time::timeout(Duration::from_secs(1), paths_server.next())
.await
.unwrap(),
None
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try if adding a tokio::time::pause() ahead of this call makes it possible to reduce the timeout durations to 1 microsecond or sth like that?


fn watch_conn_type(
paths_watcher: impl Watcher<Value = PathInfoList> + Send + Unpin + 'static,
conn: Connection,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be &Connection?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Comment on lines +1024 to +1030
/// A [`PathWatcher`] does not keep the [`Connection`] itself alive. If all references to
/// a connection are dropped, the [`PathWatcher`] will start to return an error when
/// updating. Its last value may still be used - note however that accessing
/// stats for a path via [`PathInfo::stats`] returns `None` if all references to a
/// [`Connection`] have been dropped. To reliably access path stats when a connection closes,
/// wait for [`Connection::closed`] and then call [`Connection::paths`] and directly
/// iterate over the path stats while the [`Connection`] struct is still in scope.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with this. Didn't we also discuss making Connection::closed so that it would not keep the connection alive?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we discussed that, but as of now Connection::closed is lifetime-bound onto &Connection so it keeps the connection alive. Connection::on_closed does not.

The final path stats are stored in the quinn::Connection, so if all iroh::Connection structs are dropped by the the time the PathWatcher processes the last event, thenPathInfo::stats returns None. If a Connection is still in scope, it is guaranteed to return Some(PathStats).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue that tracks removing all these duplicate on-closing things yet? Does making this change make it harder to de-duplicate these methods?

/cc @matheus23

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #3959

I think the example works just as well with on_closed, since it doesn't consume the conn in scope and the connection is only dropped at the end of the scope.

&mut self,
handle: WeakConnectionHandle,
tx: oneshot::Sender<PathsWatcher>,
tx: oneshot::Sender<PathWatchable>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly odd to send away the watchable instead of the watcher, because that allows the remote to change the values. And design-wise that's not something we want to permit here I think.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods on PathWatchable to update the value are pub(super) so are not accessible from the connection, only from the RemoteStateActor.
The upside of doing it like this is that we can check if there's any subscribers, and not store abandoned paths in the watcher value if there are none. This is not easily possible, I think, if we add a watcher to the connection.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the update methods are not visible to the users, but I made this comment for our own code structure and maintainability. We make plenty of mistakes ourselves and it is easy to forget the implications when we later have a PR that does something somewhere just because it was possible.

The upside of doing it like this is that we can check if there's any subscribers, and not store abandoned paths in the watcher value if there are none. This is not easily possible, I think, if we add a watcher to the connection.

Can you point to the function doing this optimisation right now so I can get an idea of the impact and the tradeoff being made here?

Copy link
Copy Markdown
Member Author

@Frando Frando Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be here:

if !self.paths.has_watchers() {

And a note on visibility - PathWatchable is never exposed to users, the pub(super) is to prevent ourselves from calling anything but watch from outside of the remote_state module. We could add an abstraction for this to n0-watcher potentially. If we'd keep the n0_watcher::Direct in the connection, we'd also needlessly store the watcher's value (which is an allocation if more than 4 paths) with every clone of the connection. The watcher always includes a clone of the value at the time of creation or last get. So even though the watcher might be entirely unused, it would store the path list with every clone of the Connection. And we can't Arc it either, because you need mut access to update the value, thus Connection::paths needs to returned an un-arced Direct.

This all led me to towards the design which stores a Watchable and not a Direct in the connection.

We could add a new abstraction for this in n0-watcher. I previously proposed a WeakWatcher (n0-computer/n0-watcher#31) but since the peek method was added to the Watcher trait this is not easily possible anymore. We could add a LazyWatcher or such, that internally contains a Watchable but only exposes the watch method; basically moving the visibility restriction from the remote_state::path_watcher::PathWatchable into a new type from n0-watcher. Not sure if it's worth it though.


impl Drop for ConnectionState {
fn drop(&mut self) {
self.path_watchable.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is needing this explicit close a side-effect of cloning the entire watchable that is sent to the AddConnection's oneshot sender? Nothing really wrong with it, but it feels a bit footgun-y

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. See comment above for why I opted to do it like this.

@ramfox ramfox added this to the iroh: v0.97 milestone Feb 17, 2026
@Frando
Copy link
Copy Markdown
Member Author

Frando commented Feb 18, 2026

On the remaining comments:

  • The PathWatchable, which contains a Watchable, is cloned into the connection so that a n0_watcher::Direct is only created if the user actually creates a PathWatcher via Connection::paths. This is cheaper and uses less memory then in the case where a user does not create a PathWatcher. With the current architecture, the existence of a watcher for the watchable decides if all abandoned paths are kept or not (if a watcher exists they are kept, if not, then not). This is why I wanted a design where a watcher is only created if actually used, and not within each clone of a Connection. I think it would be wasteful to always collect all abandoned paths, but also it should be possible. I think the current design is a good compromise here.

  • Note that even though the PathWatchable is cloned onto the Connection, the methods to set the value are not available, because they are pub(super), thus only accessible from the RemoteStateActor. The Connection can only call the pub(crate) fn watch to create a new watcher.

  • I don't think we can get rid of PathInfo::stats returning an Option, unless we make the PathWatcher contain a clone of the connection. We can do that, it would be a different tradeoff. If we did, the existence of a PathWatcher would prevent close-on-drop. Whereas in the current design, the existence of a PathWatcher does not guarantee that PathInfo::stats returns Some, unless a Connection is alive, then it is guaranteed. I tried to document this, if it isn't clear enough I can refine again.

  • The last point is the reason why I wrote in the docs that if you want guaranteed path stats at the end of a connection, the recommended pattern is:

let paths = conn.paths();
// do whatever
conn.closed().await;
for path in paths.get() {
    let stats = path.stats().expect("safe because `conn` is still in scope");
}

I don't love it, but I think it is acceptable, and at least clearly documentable.

All that said, I would be in favor of merging this, as the current state on main is just broken.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 19, 2026

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 66075c2

Copy link
Copy Markdown
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still like that improvement to the test with pause(), but otherwise this looks good to me now 👍

Of course there are still some open comments with flub.

@Frando Frando enabled auto-merge February 20, 2026 12:37
@matheus23
Copy link
Copy Markdown
Member

@Frando this needs a doc reference fix still ✌️

@Frando Frando added this pull request to the merge queue Feb 20, 2026
Merged via the queue into main with commit b11e707 Feb 20, 2026
28 of 29 checks passed
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Feb 20, 2026
@matheus23 matheus23 deleted the Frando/pathwatch branch February 20, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants