Skip to content

api: many Endpoint methods have unexpected behaviours after calling Endpoint::close #3905

@ramfox

Description

@ramfox

In my adventures investigating how best to handle #3867, it's become clear to me that we have some weird behaviours in some endpoint methods after close is called.

TBH, I'm not sure how to handle some of them, while others are obvious.

Here is a the full list of methods on the Endpoint and how they behave after calling Endpoint::close:

  • set_alpns -> sets new alpns on the quinn::Endpoint
  • insert_relay -> sets a new relay in the Socket successfully, but fails silently when updating the actor (since it is no longer running)
  • remove_relay -> removes the relay, but fails silently when updating the actor (since it is no longer running)
  • connect -> doesn't return an error saying "Endpoint is closed", it returns an "Internal consistency error" because the underlying actor is closed
  • accept -> returns None...the only sane response so far
  • addr -> returns an endpoint addr. This seems sane, you could want to know or record the last known addresses for this endpoint, even after close
  • watch_addr -> returns a watcher. Depending on how you use the watcher, this could make sense OR be weird for users. Eg, if you call get on the watcher, you receive the last known endpoint addr, but if you call stream, then it will return successfully once, and hang on any other calls to stream.next().await
  • net_report -> returns a watcher. Calling get on the watcher returns None, calling stream hangs when you iterate past the initial value (like watch_addr)
  • bound_sockets -> returns the addrs to the last bound sockets, this seems like a reasonable thing you may want to know/record after close
  • dns_resolver -> returns the dns resolver, this seems weird TBH
  • address_lookup -> returns the address_lookup. The AddressLookup system is not currently shutdown on endpoint.close, so calls to resolve and publish still work. This seems weird.
  • metrics -> seems reasonable to return these after calling close
  • remote_info -> returns remote_infos...it does not seem reasonable to me to return remote infos after the endpoint has closed, if the user wanted to collect these, they should use the hook
  • network_change -> fails silently
  • set_user_data_for_address_lookup -> updates the user_data on the socket and calls publish...we should not be doing any of that
  • create_server_config_builder -> returns the server config, possibly this is reasonable to return? Because we already cache this information on the endpoint, it doesn't cause odd behaviour in other places in the endpoint
  • id -> returns the endpoint id, this is fine
  • is_closed -> this is the point 😂
  • online -> hangs... but this is expected
  • secret_key -> returns the secret_key, this is reasonable to do after close
  • endpoint -> returns the underlying quinn::Endpoint...I think we should only return this if the iroh::Endpoint has not been closed. That way we can manually drop/cleanup the underlying quinn endpoint if we need to
  • socket -> same as the endpoint method, think we should only return this if the iroh::Endpoint is still open

Proposal

The methods are separated into a few different categories, from what I can see:

  1. methods that should error or return None if the endpoint is closed
  2. methods that should return a cache of the last known value
  3. methods that should likely just no nothing besides log that the endpoint is closed.

First, methods that should error/return None if the endpoint is closed:

should return errors

  • connect (but should be an EndpointClosed kind of error)
  • address_lookup
  • dns_resolve
    These are only used in tests, but should still return an error if the endpoint is closed
  • endpoint
  • socket

should return none

  • accept
  • remote_info
  • remove_relay
  • insert_relay

I would put watch_addr and net_report on this list, but I suspect if we were able to actually drop the underlying iroh::Socket, the Watchers that return from these methods would not hang any longer, in which case, it seems less "risky" to me to return these.

Second category are methods that should return a cache of the last known value:

  • addr
  • bound_sockets
  • metrics
  • create_server_config_builder
  • id
  • is_closed
  • secret_key
  • last_net_report (NEW "hidden" METHOD)

The third category are methods that should probably do nothing and just log an warning/debug that the endpoint is closed

  • set_user_data_for_address_lookup
  • set_alpns
  • network_change

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

✅ Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions