Skip to content

Refactor model to handle multiple device connections #4207

@calmh

Description

@calmh

Currently we have a lot of structures keyed on the device ID, and assume that there will be exactly one or zero connections to a device. This is probably a design mistake, as we end up removing state for connection a when connection b is closed and things like this:

I don't particularly want to maintain more than one connection for more than a few seconds, but it could be good to be able to do so and have things actually work during this brief period of parallellity. Apart from the issues above, I think we could solve:

  • The connection switching is racy and slow because the existing connection needs to be dropped by the other side before the new one can be accepted

  • Timeout handling is suboptimal because we don't accept a new connection until the old one has been timed out

If we could accept more than one connection per device we could do more seamless switching, accept new connections and slowly garbage collect older ones. We might also fix the bugs mentioned above.

Some obvious complications that would need to be taken care of:

  • Duplicate index updates. We should probably send index updates on all connections, as we don't know which one is the best / alive, but the receiving side must be able to handle this. Maybe we already do and the update becomes a no-op, but it needs to be thought about.

  • Load balancing requests over multiple connections when there is more than one, taking into account that a connection might be dead and not yet timed out so we shouldn't throw too many requests at it if it's not responding.

  • Reaping worse-priority connections once better-priority ones are established.

The people limited by TCP latency could increase some tuning value connectionsPerDevice to allow multiple, persistent connections per device, if it anyway works technically with more than one.

(#4202 is related but different and I don't want the discussion on that one to bleed over to this ticket.)

As possible implementation steps, we could:

  1. Key everything on connection IDs instead and retain the rest of the logic as is. This should hopefully solve a couple of bugs while not bringing any other enhancements to the table. If we have more than one connection alive towards a given device, pick one at random and pretend there was only one. This is more or less what happens today but not explicitly.
  2. Somehow handle when there is more than one connection more gracefully than just picking one at random.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew features or improvements of some kind, as opposed to a problem (bug)frozen-due-to-ageIssues closed and untouched for a long time, together with being locked for discussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions