NetworkDB does not always reliably converge

### Description

In our setups, we keep having issues around Docker DNS resolution around times where we either:

1. restart Docker nodes in quick succession
2. update Docker nodes (and therefore restart them in quick succession)
3. have network issues

For a second we thought that the MTU settings for the networking controlplane might be the issue, but the issue seems to have happened even on an MTU that fits our setup (we used 1350 instead of the default 1500).

It seems that during these times, the gossip network of networkdb does not get synched up properly. We debugged this with the built in debugging tooling of libnetwork (which is really helpful btw) and found that there exists no mechanism in dockerd that concerns itself with resynching the `endpoint_table` of networkdb (or `overlay_peer_table` for that matter but that does not seem to be that much of a problem). We double checked this by going through the code of libnetwork/agent.go:

The only places that update anything in networkdb are calls to `addServiceInfoToCluster` (CreateEntry), `addDriverInfoToCluster` (CreateEntry), `deleteDriverInfoFromCluster` (DeleteEntry), `deleteServiceInfoFromCluster` (DeleteEntry). `disableServiceInNetworkDB` (UpdateEntry).

While we might have missed things, here comes our proposal: There should be a (opt-in) docker daemon config option that enables a background job in the docker daemon that on a schedule resyncs all the DNS entries to networkdb. Design wise I have not thought about it a lot, but I imagine that this should be fine to run on a schedule of maybe 1-5 minutes in most clusters. This way, whenever a DNS entry is out of sync things should fix themselves in a somewhat acceptable schedule.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetworkDB does not always reliably converge #47728

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NetworkDB does not always reliably converge #47728

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions