IPv6 only: changes to bridge driver and gateway Endpoint selection#48323
IPv6 only: changes to bridge driver and gateway Endpoint selection#48323thaJeztah merged 4 commits intomoby:masterfrom
Conversation
8c85c53 to
1f5fdfc
Compare
ed1f7ed to
1a3cf05
Compare
|
|
||
| // NoProxy6To4 disables proxying from an IPv6 host port to an IPv4-only | ||
| // container, when the default binding address is 0.0.0.0. | ||
| NoProxy6To4 = DriverPrivatePrefix + ".no_proxy_6to4" |
There was a problem hiding this comment.
If we come up with a more solid way of picking up endpoints that should expose ports, we'll probably drop this label. But, DriverPrivatePrefix is a red herring, as there's nothing taking care of discarding 'private' labels from user-supplied options. So, in fact, this new label is now part of the 'public API', and dropping it in a future release would be a breaking change.
We should at least complement this comment to warn that it's not meant to be used by end-users. Or, we can be stricter, and drop all 'private' labels from the set of user-defined options passed to libnetwork.
There was a problem hiding this comment.
I'll update the comment to ...
// NoProxy6To4 disables proxying from an IPv6 host port to an IPv4-only
// container, when the default binding address is 0.0.0.0. This label
// is intended for internal use, it may be removed in a future release.
libnetwork/default_gateway.go
Outdated
| } | ||
|
|
||
| // selectGatewayEndpoint is like getGatewayEndpoint, but selects only from | ||
| // endpoints. It can be called when the Sandbox mutex is already held. |
There was a problem hiding this comment.
But nothing is holding sb's mutex when it's called from getGatewayEndpoint. At best, some code paths call joinLeaveStart to protect against races, but that's not universal.
There was a problem hiding this comment.
I split selectGatewayEndpoint out of getGatewayEndpoint so that it could be called from clearNetworkResources with the mutex already held.
getGatewayEndpoint now calls selectGatewayEndpoint(sb.Endpoints()) ... and sb.Endpoints() acquires the mutex to make a copy of the slice.
Previously, getGatewayEndpoint did this, which is equivalent ...
func (sb *Sandbox) getGatewayEndpoint() *Endpoint {
for _, ep := range sb.Endpoints() {
...
So, if there's a path with a race, I don't think this change affects it.
There was a problem hiding this comment.
Oh, absolutely, I should have been more explicit! In terms of concurrency, your code is equivalent to what we had before.
But I think the comment
It can be called when the Sandbox mutex is already held.
is a bit misleading, because obvisouly that's no what's happening just above.
akerouanton
left a comment
There was a problem hiding this comment.
See my comment about selectGatewayEndpoint
| if c.Mtu, err = strconv.Atoi(value); err != nil { | ||
| return parseErr(label, value, err.Error()) | ||
| } | ||
| case netlabel.EnableIPv4: |
There was a problem hiding this comment.
I'm always surprised that we need to parse the same label twice. We should look at unifying fromLabels and parseNetworkOptions at some point.
libnetwork/default_gateway.go
Outdated
| } | ||
|
|
||
| // selectGatewayEndpoint is like getGatewayEndpoint, but selects only from | ||
| // endpoints. It can be called when the Sandbox mutex is already held. |
There was a problem hiding this comment.
Oh, absolutely, I should have been more explicit! In terms of concurrency, your code is equivalent to what we had before.
But I think the comment
It can be called when the Sandbox mutex is already held.
is a bit misleading, because obvisouly that's no what's happening just above.
| if err = d.ProgramExternalConnectivity(ctx, n.ID(), ep.ID(), sb.Labels()); err != nil { | ||
| labels := sb.Labels() | ||
| labels[netlabel.NoProxy6To4] = noProxy6To4After | ||
| if err := d.ProgramExternalConnectivity(ctx, n.ID(), ep.ID(), labels); err != nil { |
There was a problem hiding this comment.
Any reason not to use the newly introduced ep.programExtrernalConnectivity here?
There was a problem hiding this comment.
ep.programExternalConnectivity loads the driver and network from the store. It's used by sbLeave when the departing endpoint is a gateway, and some other Endpoint needs to take over the gateway-ness. That other endpoint may have a different driver, and will have a different network.
Here, during sbJoin, the driver and network are ep's - so they're already loaded and the driver call can be made directly.
| extN, err := extEp.getNetworkFromStore() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get network from store for revoking external connectivity during join: %v", err) | ||
| gwepAfter4, gwepAfter6 := sb.getGatewayEndpoint() |
There was a problem hiding this comment.
Heck, not critizing your code, but that level of Single Responsibility violation is super crazy! 🙈
We should really straighten that code up at some point.
| if ipamV4Data[0].Gateway != nil { | ||
| c.AddressIPv4 = types.GetIPNetCopy(ipamV4Data[0].Gateway) | ||
| } | ||
| if ipamV4Data[0].Gateway != nil { |
There was a problem hiding this comment.
I think you can't get an IPAM config with no gateway assigned
Lines 1622 to 1638 in 367c910
And falling back to the Pool address would be invalid as it's the 'network address'. Probably something we should clean up in a follow-up.
|
Rebased - no other changes. |
|
|
||
| var err error | ||
| ep.portMapping, err = n.addPortMappings(context.TODO(), ep.addr, ep.addrv6, cfg, n.config.DefaultBindingIP) | ||
| ep.portMapping, err = n.addPortMappings(context.TODO(), ep.addr, ep.addrv6, cfg, n.config.DefaultBindingIP, ep.extConnConfig.NoProxy6To4) |
There was a problem hiding this comment.
Unrelated to these changes, but I'm curious why this function checks for ep.extConnConfig.ExposedPorts or ep.extConnConfig.PortBindings to be nil if it doesn't use those values 🤔 (also curious why ExposedPorts is needed for Portbindings to be used);
moby/libnetwork/drivers/bridge/bridge_store.go
Lines 399 to 403 in 6ac445c
There was a problem hiding this comment.
Port bindings don't imply that ports should be exposed (I'm not sure why, but I don't think we can change it now). If ports aren't exposed, the binding just doesn't do anything.
So, if either field is missing, there's nothing to do.
Perhaps I could have changed it to check ep.portBindings instead, in 4f09af6 which is the change that removed direct use of those values. I think that'd probably be equivalent now (and would have been before that change) ... maybe something to look at in a follow-up.
libnetwork/endpoint.go
Outdated
| log.G(ctx).Debugf("Revoking %s external connectivity on endpoint %s (%s), NoProxy6To4:%v", | ||
| role, ep.Name(), ep.ID(), noProxy6To4Before) |
There was a problem hiding this comment.
Probably good to use the same structured logs fields for this one as used below (there's also some existing logs further up in this function that log ep.Name() but not as structured log).
Not blocking though (but probably good to make a pass at doing so); perhaps constructing a logger could work for this (logger := ....WithFields() or ctx := log.WithLogger(....)
There was a problem hiding this comment.
Ok - done, as a separate commit.
| return types.InvalidParameterErrorf("ipv4 pool is empty") | ||
| } | ||
| if config.EnableIPv6 && (len(ipV6Data) == 0 || ipV6Data[0].Pool.String() == "::/0") { | ||
| return types.InvalidParameterErrorf("ipv6 pool is empty") |
There was a problem hiding this comment.
🙈 looks like we're not consistent on casing; first error above uses IPv4 / IPv6, and these both use lowercase 😂
|
Doh! I think I caused a merge conflict in my linting PR; happy do do the rebase for you |
|
(CI is green, so happy to merge without waiting for it to fully run again after the merge conflict is resolved) |
Signed-off-by: Rob Murray <rob.murray@docker.com>
When the host has IPv6 support but the container doesn't, the default bind address is 0.0.0.0, and a port mapping does not include a host address - the default behaviour (of the bridge driver) is to run a docker-proxy to map from any IPv6 host address to the IPv4 address of a container's gateway endpoint. Driver option com.docker.network.driver.private.no_proxy_6to4 can now be used to disable that behaviour when configuring a Sandbox's external connectivity. Signed-off-by: Rob Murray <rob.murray@docker.com>
A dual-stack endpoint still has priority when selecting a gateway Endpoint for a Sandbox. But, now there are IPv6-only networks, it is possible to have a Sandbox with only IPv4-only and IPv6-only endpoints. This change means they are both gateway endpoints. Tell the network driver it mustn't proxy host-IPv6 to endpoint-IPv4 when there's an IPv6 gateway endpoint (which may belong to a different net driver). Update that when networks are connected/disconnected. Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
No worries - it's rebased, I'll let CI run because there were quite a few commits (and a conflict in a test with the |
|
All green again! Merging; thank you! |
- What I did
EnableIPv4to the bridge driver.- How I did it
Notes on gateway endpoints ...
A gateway endpoint gets the default route and, for the bridge driver, port mappings from the host.
Before this change, a dual-stack endpoint was preferred over an IPv4-only endpoint - if there was an IPv6 endpoint, it was always dual-stack. So, there was no way for the IPv4 and IPv6 gateway endpoints to be different.
Now, a container with IPv4-only and IPv6-only endpoints needs two separate gateways. This is further complicated by the bridge driver's use of docker-proxy to map from host IPv6 addresses to an IPv4-only container, when the host address is unspecified in a port mapping.
If an IPv6 Endpoint is added to a Sandbox that only has an IPv4 endpoint, proxying from the host IPv6 address has to stop (so, the IPv4 gateway hasn't changed, but the driver still needs to be reconfigured). And vice-versa, if an IPv6-only endpoint is removed, leaving only IPv4-only endpoints, the host IPv6 proxy needs to be added.
Note that the new IPv6 endpoint may not belong to a bridge network, the bridge driver doesn't have the information it needs to deal with that ... proxying host-IPv6 to container-IPv4 is policy that needs to be handled by libnetwork. So, there's a new (private) driver option
netlabel.NoProxy6To4that can be passed to the driver's ProgramExternalConnectivity. Its value is calculated by libnetwork when an endpoint is added or removed, true if the IPv4 and IPv6 gateways are different. When its value changes, the IPv4 gateway is removed by RevokeExternalConnectivity, then re-added. So, ongoing connections will be interrupted and, if host ports are not specified in a port mapping, they'll change when the mapping is re-created.Before this change, when adding a dual-stack macvlan/ipvlan endpoint to a Sandbox that only has IPv4 endpoints, the dual-stack endpoint became the gateway. So, if the existing gateway was an IPv4-only bridge endpoint with port mappings from the host, those port mappings would be removed. With this change, that behaviour is preserved - and the same logic applies when an IPv6-only macvlan/ipvlan is added to a Sandbox with a bridge IPv4-only endpoint; port mappings from host-IPv6 are removed.
We could offer better control over the way gateway endpoints are selected, default routes, what happens when networks are added/removed, and host port mappings. We could extend the driver interface to add a way to change gateway settings. But, I've treated all that as out-of-scope here.
- How to verify it
New and updated tests.
- Description for the changelog