Skip to content

fix: stop relay actor on endpoint close#3601

Merged
Frando merged 2 commits intomainfrom
Frando/shutdown-relay
Nov 4, 2025
Merged

fix: stop relay actor on endpoint close#3601
Frando merged 2 commits intomainfrom
Frando/shutdown-relay

Conversation

@Frando
Copy link
Copy Markdown
Member

@Frando Frando commented Nov 4, 2025

Description

Since the transport abstraction was introduced, we no longer shut down the relay actor when calling Endpoint::close. The relay actor is currently only stopped when the last clone of the iroh::Endpoint is dropped. This is surprising and wasteful. This PR fixes this so that the relay actor is stopped when calling Endpoint::close.

Background: Since introducing the transport abstraction, the relay actor handle is now owned by Transports which is owned by MagicUdpSocket which is owned by the quinn::Endpoint as its AsyncUdpSocket. The AsyncUdpSocket trait has no notification when the endpoint is closing, thus currently the relay transport (and the relay actor, which it owns) is not notified at all when the endpoint is shutting down. Therefore, it keeps running (and potentially reconnecting to the relay or receiving messages) until it is stopped when the last clone of the endpoint is dropped and thus the AbortOnDropHandle owning the relay actor tasks is dropped, finally stopping the relay actor tasks.

This PR exposes an existing cancellation token within the relay actor to the magicsocket, allowing it to cancel the relay actors when calling Endpoint::close.

Breaking Changes

Notes & open questions

I tested this by looking at logs after closing an endpoint. Not sure if there is a good way to test this. I'd vote to merge this even without a test because it fixes an issue, unless someone has a good idea on how to test this.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@Frando Frando marked this pull request as ready for review November 4, 2025 09:11
@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 4, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3601/docs/iroh/

Last updated: 2025-11-04T09:46:33Z

@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 4, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: c7f8774

@Frando Frando requested review from dignifiedquire and flub November 4, 2025 09:21
return;
}
self.msock.closing.store(true, Ordering::Relaxed);
self.msock.cancel_transports.cancel();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new token, would this be done better as a child token from the actor_token that already exists on the line below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds good. Did that.

Copy link
Copy Markdown
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably agree having a single close_token instead of two separate cancel_transport and actor_tokens is better, but IMO that's just a nit.

@Frando
Copy link
Copy Markdown
Member Author

Frando commented Nov 4, 2025

Changed to reuse the existing cancel token.

@Frando Frando added this pull request to the merge queue Nov 4, 2025
Merged via the queue into main with commit 30c23e8 Nov 4, 2025
29 checks passed
@flub flub deleted the Frando/shutdown-relay branch November 4, 2025 13:19
github-merge-queue bot pushed a commit that referenced this pull request Nov 6, 2025
## Description

Since we merged #3601 we sometimes get ugly (but harmless) error log
messages when the endpoint is shutting down:
```
2025-11-06T11:28:26.721826Z ERROR ep-client:client: iroh::magicsock::transports::relay: relay_recv_channel closed
2025-11-06T11:28:26.721884Z ERROR ep-client:client: iroh_quinn::endpoint: I/O error: connection closed
```
This fixes it: We already have a check in the transports to return
`Pending` without polling the actual transports once
`Magicsock::is_closed` is `true`. This PR changes it to instead check
`Magicsock::is_closing`. The latter is set **after**
`Endpoint::wait_idle` completed, but before cancelling the actor and
transport tasks. Whereas `is_closed` is set *after* the actor and
transport tasks are fully shutdown. When `Endpoint::wait_idle`
completes, we do not expect the quinn endpoint to do anything
meaningful, so it is fine to stop polling the transports at this time,
and thus not get errors from the transports into quinn if they fail
(because they are shutting down).

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist
<!-- Remove any that are not relevant. -->
- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
- [ ] List all breaking changes in the above "Breaking Changes" section.
- [ ] Open an issue or PR on any number0 repos that are affected by this
breaking change. Give guidance on how the updates should be handled or
do the actual updates themselves. The major ones are:
    - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc)
    - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip)
    - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs)
    - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe)
    - [ ] [`sendme`](https://github.com/n0-computer/sendme)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants