Skip to content

feat: dropping the endpoint ungracefully should log and error, but still clean up resources immediately#3879

Merged
ramfox merged 20 commits intomainfrom
ramfox/dropped-endpoint
Feb 19, 2026
Merged

feat: dropping the endpoint ungracefully should log and error, but still clean up resources immediately#3879
ramfox merged 20 commits intomainfrom
ramfox/dropped-endpoint

Conversation

@ramfox
Copy link
Copy Markdown
Member

@ramfox ramfox commented Jan 22, 2026

Description

closes #3867

relies on #3924

Alot of finagaling needed to happen to ensure that when the Endpoint is dropped, it's underlying resources are cleaned up ASAP.

First, added a custom Runtime that wraps spawned tasks in a CancellationToken. This allows for both graceful shutdown (which does not use the cancel token, but instead waits for tasks to finish naturally) and ungraceful abort (which uses the cancel token to stop all running tasks)

Second, the quinn::Endpoint inside of the Handle needed to be wrapped in an Arc<RwLock<Option>>, so we could take the underlying quinn::Endpoint and drop it on Handle::close and the new Handle::abort method. This value should only be written once (during the close or abort), and so the read lock should never be in contention.

This was fixed in iroh-quinn, so no need to add a rw lock

Third, wrote a new Handle::abort method that is sync and does not wait for graceful clean up after cancelling tokens, before dropping the underlying quinn::Endpoint or aborting any runtime tasks.

Fourth, in order to ensure that the Endpoint is only "aborted" on the last clone of the endpoint, the meat of the Endpoint has been moved into an EndpointInner that is wrapped in an Arc. When the Arc is finally dropped, the Drop impl on EndpointInner is called, which calls Handle::abort, if the Endpoint was not gracefully closed before dropping.

The last major change is the work that needed to be done on Accept. The call to quinn::Endpoint::Accept takes a reference to the quinn::Endpoint, but since it is now behind a RwLock it can not longer be borrowed. This refactor turned the Accept into a BoxFuture, whose closure owns a clone of the quinn::Endpoint.

Finally, let's make sure that we drop any references to address lookup services on close.

NOTE: While I used claude to code-review/double check assumptions while working on this PR in general, the Accept changes were where I relied on AI the most to give me a "sensible" work around to the self-reference problem I was facing when making the RwLock refactors. I would love some feedback here if this is an acceptable solution.
No longer necessary since we do not wrap the quinn::Endpoint in a rwlock any more

Breaking Changes

New behavioral change! Then Endpoint does not do ANY best-effort to close connections gracefully before dropping. This makes it even more important to call Endpoint::close in all cases.

This change (that prevents the underlying quinn::Endpoint to do it's best to wait/send close frames before shutting down) has already highlighted in tests how much more brittle it is to drop the endpoint without calling close first.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.

@ramfox ramfox marked this pull request as draft January 22, 2026 03:32
@ramfox ramfox changed the title feat: dropping the endpoint ungracefully should log and error, but still clean up resources immediately WIP feat: dropping the endpoint ungracefully should log and error, but still clean up resources immediately Jan 22, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 22, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3879/docs/iroh/

Last updated: 2026-02-19T18:23:46Z

@n0bot n0bot bot added this to iroh Jan 22, 2026
@github-project-automation github-project-automation bot moved this to 🚑 Needs Triage in iroh Jan 22, 2026
@ramfox ramfox removed the status in iroh Jan 22, 2026
@ramfox ramfox force-pushed the ramfox/dropped-endpoint branch from 64018a2 to af81061 Compare February 10, 2026 23:17
@ramfox ramfox changed the title WIP feat: dropping the endpoint ungracefully should log and error, but still clean up resources immediately feat: dropping the endpoint ungracefully should log and error, but still clean up resources immediately Feb 12, 2026
@ramfox ramfox changed the base branch from main to ramfox/endpoint-api February 12, 2026 02:14
@ramfox ramfox marked this pull request as ready for review February 12, 2026 02:14
@ramfox ramfox self-assigned this Feb 12, 2026
@ramfox ramfox added the c-iroh Functionality of the core iroh crate. label Feb 12, 2026
@ramfox ramfox moved this to 👀 In review in iroh Feb 12, 2026
@ramfox ramfox moved this from 👀 In review to 🏗 In progress in iroh Feb 12, 2026
@ramfox ramfox added this to the iroh: v0.97 milestone Feb 12, 2026
@matheus23
Copy link
Copy Markdown
Member

relies on #3879

This is a self-reference in the PR description?

@ramfox ramfox force-pushed the ramfox/endpoint-api branch from f5440f4 to c9aaa59 Compare February 12, 2026 18:15
@ramfox ramfox force-pushed the ramfox/dropped-endpoint branch from 353e3f0 to b298b42 Compare February 13, 2026 01:52
Base automatically changed from ramfox/endpoint-api to main February 13, 2026 08:34
ramfox pushed a commit that referenced this pull request Feb 13, 2026
## Description

Based on #3941 which is based on #3879

* Add `Endpoint::closed`, which returns a `CancellationToken` that is
cancelled once the endpoint is closing
* Document watcher behavior with regards to closing the endpoint
* Fix test

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

* Instead of returning a `CancellationToken` we could make this an
`async fn`, however to me it looks quite useful to get a
`CancellationToken` and be able to use
`tokio::spawn(endpoint_closed.run_until_cancelled_owned(async move { ..
})))`
* The cancellation token is cancelled when `Endpoint::close` *starts*,
not when it *ends*. I wasn't sure what expected behavior is here. I
opted for start because this guarantees that the endpoint is fully
usable until then.
ramfox pushed a commit that referenced this pull request Feb 14, 2026
## Description

Based on #3879 
Depends on n0-computer/noq#426

With n0-computer/noq#426 the
quinn::EndpointDriver stops once the quinn::Endpoint is closed and all
connections are drained. Thus we don't need the `RwLock` around the
`quinn::Endpoint` anymore.

See [this
thread](#3879 (comment))
for details.

* Add `Endpoint::closed`, which returns a `CancellationToken` that is
cancelled once the endpoint is closing
* Document watcher behavior with regards to closing the endpoint
* Fix test

## Notes & open questions

* Instead of returning a `CancellationToken` we could make this an
`async fn`, however to me it looks quite useful to get a
`CancellationToken` and be able to use
`tokio::spawn(endpoint_closed.run_until_cancelled_owned(async move { ..
})))`
* The cancellation token is cancelled when `Endpoint::close` *starts*,
not when it *ends*. I wasn't sure what expected behavior is here. I
opted for start because this guarantees that the endpoint is fully
usable until then.
“ramfox” and others added 7 commits February 16, 2026 20:56
…cept` future if the future is ever polled
## Description

Based on #3879 
Depends on n0-computer/noq#426

With n0-computer/noq#426 the
quinn::EndpointDriver stops once the quinn::Endpoint is closed and all
connections are drained. Thus we don't need the `RwLock` around the
`quinn::Endpoint` anymore.

See [this
thread](#3879 (comment))
for details.

* Add `Endpoint::closed`, which returns a `CancellationToken` that is
cancelled once the endpoint is closing
* Document watcher behavior with regards to closing the endpoint
* Fix test

## Notes & open questions

* Instead of returning a `CancellationToken` we could make this an
`async fn`, however to me it looks quite useful to get a
`CancellationToken` and be able to use
`tokio::spawn(endpoint_closed.run_until_cancelled_owned(async move { ..
})))`
* The cancellation token is cancelled when `Endpoint::close` *starts*,
not when it *ends*. I wasn't sure what expected behavior is here. I
opted for start because this guarantees that the endpoint is fully
usable until then.
## Description

Based on #3879

This makes `Endpoint::closed` return a named future instead of
`CancellationToken` to not depend on `tokio_util` in our public API.
@ramfox ramfox force-pushed the ramfox/dropped-endpoint branch 4 times, most recently from a9c255f to 2b56e64 Compare February 17, 2026 03:05
@ramfox ramfox force-pushed the ramfox/dropped-endpoint branch from 2b56e64 to cd86c5f Compare February 17, 2026 03:05
@ramfox ramfox requested review from Frando and matheus23 February 17, 2026 18:41
Copy link
Copy Markdown
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the nit

@ramfox ramfox merged commit 9cf417a into main Feb 19, 2026
25 of 29 checks passed
@ramfox ramfox deleted the ramfox/dropped-endpoint branch February 19, 2026 18:24
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c-iroh Functionality of the core iroh crate.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

bug: endpoint does not clean up resources swiftly when dropped ungracefully

5 participants