feat(iroh)!: Retain stats for closed and abandoned paths in the path watcher#3899
feat(iroh)!: Retain stats for closed and abandoned paths in the path watcher#3899
Conversation
6bf73fc to
ed10e46
Compare
ed10e46 to
b324c7a
Compare
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3899/docs/iroh/ Last updated: 2026-02-20T14:13:44Z |
cec6756 to
6285cb7
Compare
flub
left a comment
There was a problem hiding this comment.
Generally LGTM (despite my sadness is discord), only the path state naming needs to be sorted. I know not everywhere uses the right terminology yet, e.g. the RemoteStateActor does also use some somewhat lose terms. We'll update things bit by bit to get to the consistent naming.
| } | ||
| } | ||
|
|
||
| impl Iterator for PathInfoListIntoIter { |
There was a problem hiding this comment.
I'm not sure I PathInfoListIntoIter is the right name, this is the iterator itself is it not? PathInfoListIter would make more sense to me. It seems SmallVec uses the same naming though, so maybe I'm just wrong.
There was a problem hiding this comment.
Yeah, I guess the IntoIter naming in e.g. SmallVec comes about when you have both an owned iterator (returned from into_iter) and a non-owned iterator (returned from _iter). The latter we also have but as unnames impl Iterator type. So for us there's no need to have the convoluted name, agreed. Renamed to PathInfoListIter.
| is_closed: bool, | ||
| is_abandoned: bool, |
There was a problem hiding this comment.
What do these values mean? is_abandoned is never even read and only set?
FWIW the great path state bikeshed resulted in having states:
- abandoned
- discarded
Going from any prior state to abandoned is "closing" (a verb), a rather loose term.
Looking at where you call set_closed and set_abandoned, I think that:
is_abandoned->is_discardedis_closed->is_abandoned
There was a problem hiding this comment.
I checked this again, and I think a single flag is_abandoned is enough. The difference whether a path is discarded or not does not matter. It was a leftover, I think, from when the discarded state meant we would have to track stats differently, but this is no longer the case now that a WeakPathHandle keeps the path stats available in the quinn Connection.
So with the latest commit, there's now only a single is_abandoned state being tracked. Nice.
| } | ||
|
|
||
| /// Returns `true` if this path is closed. | ||
| pub fn is_closed(&self) -> bool { |
There was a problem hiding this comment.
This public method name I find reasonable, despite my complaints about the internal naming. For someone without any multipath knowledge this can be helpful and the distinction between abandoned and discarded is not much help to the user.
I would however add a longer description saying that this really returns true as soon as a path has been abandoned by the local endpoint, and remains true forever after. So that folks who do know more about multipath can make sense of it.
There was a problem hiding this comment.
I added a comment. Thought about renaming to is_abandoned but maybe for the public API is_closed is better indeed, left it for now.
|
|
||
| /// Updates the watchable's value through a closure. | ||
| /// | ||
| /// After the update is performed, and if there are currently no watchers, data for abandoned paths |
There was a problem hiding this comment.
ah, this is where is_abandoned is read. This reaffirms to me that is_discarded is the better name here.
| // Verify that the path watch streams close. | ||
| assert_eq!( | ||
| tokio::time::timeout(Duration::from_secs(1), paths_client.next()) | ||
| .await | ||
| .unwrap(), | ||
| None | ||
| ); | ||
| assert_eq!( | ||
| tokio::time::timeout(Duration::from_secs(1), paths_server.next()) | ||
| .await | ||
| .unwrap(), | ||
| None | ||
| ); |
There was a problem hiding this comment.
Can you try if adding a tokio::time::pause() ahead of this call makes it possible to reduce the timeout durations to 1 microsecond or sth like that?
iroh/examples/transfer.rs
Outdated
|
|
||
| fn watch_conn_type( | ||
| paths_watcher: impl Watcher<Value = PathInfoList> + Send + Unpin + 'static, | ||
| conn: Connection, |
| /// A [`PathWatcher`] does not keep the [`Connection`] itself alive. If all references to | ||
| /// a connection are dropped, the [`PathWatcher`] will start to return an error when | ||
| /// updating. Its last value may still be used - note however that accessing | ||
| /// stats for a path via [`PathInfo::stats`] returns `None` if all references to a | ||
| /// [`Connection`] have been dropped. To reliably access path stats when a connection closes, | ||
| /// wait for [`Connection::closed`] and then call [`Connection::paths`] and directly | ||
| /// iterate over the path stats while the [`Connection`] struct is still in scope. |
There was a problem hiding this comment.
I'm confused with this. Didn't we also discuss making Connection::closed so that it would not keep the connection alive?
There was a problem hiding this comment.
Yes, we discussed that, but as of now Connection::closed is lifetime-bound onto &Connection so it keeps the connection alive. Connection::on_closed does not.
The final path stats are stored in the quinn::Connection, so if all iroh::Connection structs are dropped by the the time the PathWatcher processes the last event, thenPathInfo::stats returns None. If a Connection is still in scope, it is guaranteed to return Some(PathStats).
There was a problem hiding this comment.
Do we have an issue that tracks removing all these duplicate on-closing things yet? Does making this change make it harder to de-duplicate these methods?
/cc @matheus23
There was a problem hiding this comment.
I filed #3959
I think the example works just as well with on_closed, since it doesn't consume the conn in scope and the connection is only dropped at the end of the scope.
| &mut self, | ||
| handle: WeakConnectionHandle, | ||
| tx: oneshot::Sender<PathsWatcher>, | ||
| tx: oneshot::Sender<PathWatchable>, |
There was a problem hiding this comment.
I find it slightly odd to send away the watchable instead of the watcher, because that allows the remote to change the values. And design-wise that's not something we want to permit here I think.
There was a problem hiding this comment.
The methods on PathWatchable to update the value are pub(super) so are not accessible from the connection, only from the RemoteStateActor.
The upside of doing it like this is that we can check if there's any subscribers, and not store abandoned paths in the watcher value if there are none. This is not easily possible, I think, if we add a watcher to the connection.
There was a problem hiding this comment.
Indeed the update methods are not visible to the users, but I made this comment for our own code structure and maintainability. We make plenty of mistakes ourselves and it is easy to forget the implications when we later have a PR that does something somewhere just because it was possible.
The upside of doing it like this is that we can check if there's any subscribers, and not store abandoned paths in the watcher value if there are none. This is not easily possible, I think, if we add a watcher to the connection.
Can you point to the function doing this optimisation right now so I can get an idea of the impact and the tradeoff being made here?
There was a problem hiding this comment.
That would be here:
And a note on visibility - PathWatchable is never exposed to users, the pub(super) is to prevent ourselves from calling anything but watch from outside of the remote_state module. We could add an abstraction for this to n0-watcher potentially. If we'd keep the n0_watcher::Direct in the connection, we'd also needlessly store the watcher's value (which is an allocation if more than 4 paths) with every clone of the connection. The watcher always includes a clone of the value at the time of creation or last get. So even though the watcher might be entirely unused, it would store the path list with every clone of the Connection. And we can't Arc it either, because you need mut access to update the value, thus Connection::paths needs to returned an un-arced Direct.
This all led me to towards the design which stores a Watchable and not a Direct in the connection.
We could add a new abstraction for this in n0-watcher. I previously proposed a WeakWatcher (n0-computer/n0-watcher#31) but since the peek method was added to the Watcher trait this is not easily possible anymore. We could add a LazyWatcher or such, that internally contains a Watchable but only exposes the watch method; basically moving the visibility restriction from the remote_state::path_watcher::PathWatchable into a new type from n0-watcher. Not sure if it's worth it though.
|
|
||
| impl Drop for ConnectionState { | ||
| fn drop(&mut self) { | ||
| self.path_watchable.close(); |
There was a problem hiding this comment.
Is needing this explicit close a side-effect of cloning the entire watchable that is sent to the AddConnection's oneshot sender? Nothing really wrong with it, but it feels a bit footgun-y
There was a problem hiding this comment.
Yes. See comment above for why I opted to do it like this.
|
On the remaining comments:
let paths = conn.paths();
// do whatever
conn.closed().await;
for path in paths.get() {
let stats = path.stats().expect("safe because `conn` is still in scope");
}I don't love it, but I think it is acceptable, and at least clearly documentable. All that said, I would be in favor of merging this, as the current state on |
bdfa85f to
77cad01
Compare
matheus23
left a comment
There was a problem hiding this comment.
Would still like that improvement to the test with pause(), but otherwise this looks good to me now 👍
Of course there are still some open comments with flub.
|
@Frando this needs a doc reference fix still ✌️ |
Description
Improves the path watcher for connections to reliably allow accessing the stats of abandoned paths.
PathWatcheralive for the duration of a connection, it will contain all paths the connection ever used.Pathis alive, addWeakPathHandlenoq#386 and keep aWeakPathHandlefor all paths in the watchable's value. This prevents the path stats to be dropped within a quinn connection even if the path is abandoned.WeakPathHandle, we can ensure that the stats are always available as long as the connection hasn't been dropped.The combination of all this gives us a reliable way to access final path stats for all paths used in a connection, as long as you keep a reference to the connection around, which is quite straightforward to do and document.
Breaking Changes
Connection::pathsandConnectionInfo::pathsnow return aPathWatcher(which still implementsn0_watcher::Watcherbut now also is a named struct)PathInfo::statsandPathInfo::rttnow returnOption. They returnNoneif the underlying connection has been droped.Notes & open questions
I spend quite some time going back-and-forth over different approaches. Very open to other ideas, but I'm a bit out of further ideas currently and this is the best I could come up with so far.
Change checklist
quic-rpciroh-gossipiroh-blobsdumbpipesendme