-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Portmapper improvements, and options to disable NAT #47871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0770137 to
68c6534
Compare
68c6534 to
d594859
Compare
d594859 to
e066f01
Compare
| p.mutex.Lock() | ||
| defer p.mutex.Unlock() | ||
|
|
||
| if ip == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's never called with ip == nil, and removing the check makes the code simpler. But, I didn't do the same for RequestPortInRange, so I can put it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's never called with ip == nil
Oh, right! Then, maybe remove that in a follow-up commit / PR.
| // on live-restore we don't want to allocate different ports - so, remove the range | ||
| // from the operational data. | ||
| for _, pm := range ep.portMapping { | ||
| pm.HostPortEnd = pm.HostPort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is really the right place to do that. Based on your comment, it seems we mix state and config in a struct meant to store only the end state. Maybe this should be done when ep.portMapping entries are created instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dealing with old saved-state - it represents running config, the ports can't be a range, but that's how the old code stored it.
The new code will treat it as a range on live-restore - it won't know it's live-restore, it's just been given the struct as config as if it had come from the API.
So the old saved-state has to be fixed up ... where else could it be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dealing with old saved-state
Oh, right 🙈 Maybe we should consider dropping this compatibility code after a few releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment to say it can be removed when upgrade from 26.x is no longer supported.
| } else if b.HostIP.To4() != nil { | ||
| v = iptables.IPv4 | ||
| } else { | ||
| // The binding is between containerV4 and hostV6, handled by docker-proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since IPv4-mapped IPv6 addrs aren't properly unmapped, I think this condition doesn't hold true.
Overall I think this condition is a bit cursed since each branch doesn't test the same field. I would rewrite it to explicitly early return whenever b.IP and b.HostIp aren't of the same family -- but then net's lack of expressiveness kicks in (ie. you'd have to write a helper function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I'll take a look at the handling of IPv4-mapped IPv6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| if bnd.HostIP.To4() == nil { | ||
| portmapper = n.portMapperV6 | ||
| func (n *bridgeNetwork) reapplyPerPortIptables4() { | ||
| n.reapplyPerPortIptables(func(b *portBinding) bool { return b.IP.To4() != nil }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's contrived! Better use an arg that clearly states what address family should be restored, and then let it filter port bindings based on that.
We should in a follow-up if we can just restore everything in a single pass and call iptables.OnReloaded only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing in a filter function is contrived? Better than a bool ipv4 or similar I think. But, can do that.
e61b9db to
026c263
Compare
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this talk about IPv6-mapped-IPv4 addresses is making me nervous. There is no semantic difference between listen()ing or bind()ing on [::ffff:10.0.0.1]:8080 vs. 10.0.0.1:8080. Only IPv4 packets will be accepted/transmitted over the wire for such a socket. So why does the portmapper need to fuss so much about net.IP slice lengths?
| ipstr := ip.String() | ||
| if _, ok := p.ipMap[ipstr]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered converting the implementation to use net/netip types internally, or would it be a bit much to tack onto this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it'd be better - but also yes, a bit too much for this PR.
| n.Lock() | ||
| defer n.Unlock() | ||
| if n.driver == nil { | ||
| return "" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with the locking and defensive coding? It looks a lot like n.driver is guaranteed to be non-nil and immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right - but every other 'bridgeNetwork method makes the same checks. A separate review/rework of all that could be good. But, for this change, it seems best to stick with tradition.
026c263 to
bc962e6
Compare
It needs to put rules into |
bc962e6 to
f6cbc15
Compare
8ce32a2 to
4c0e873
Compare
Add bridge driver options...
com.docker.network.bridge.gateway_mode_ipv4=<nat|routed>
com.docker.network.bridge.gateway_mode_ipv6=<nat|routed>
If set to "routed", no NAT or masquerade rules are set up for port
mappings.
When NAT is disabled, the mapping is shown in 'inspect' output with
no host port number. For example, for "-p 80" with NAT disabled for
IPv6 but not IPv4:
"80/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "32768"
},
{
"HostIp": "::",
"HostPort": ""
}
Signed-off-by: Rob Murray <rob.murray@docker.com>
Without this change, if a port mapping did not specify a host address and the network was IPv6-enabled, the same port would be allocated for mappings from '0.0.0.0' and '::'. But, if the port mapping was specified with explicit addresses even, for example: -p 0.0.0.0:8080-8083:80 -p '[::]:8083-8080:80' This change looks for port mappings that only differ in the host IP address, and makes sure it allocates the same port for all of them. If it can't, it fails with an error. Signed-off-by: Rob Murray <rob.murray@docker.com>
It was added so that tests could replace it before it was picked up and used by a new network's PortMapper, so that tests were isolated from each other. Now the PortMapper is not used by the bridge driver, neither is driver's portAllocator. Instead of replacing the driver.portAllocator in tests, reset the singleton instance using its ReleaseAll(). Un-export portallocator.NewInstance, now the tests aren't using it. Signed-off-by: Rob Murray <rob.murray@docker.com>
Display a PortBinding in a format that's more like the one used in the CLI, but includes the container IP if known. Signed-off-by: Rob Murray <rob.murray@docker.com>
When bridge driver opt com.docker.network.bridge.gatway_mode_ipv[46] is set to "routed", there is no NAT. When there's no NAT, there's no meaning to the HostPort field in a port mapping (all the port mapping does is open the container's port), and the HostIP field is only used to determine the address family. So, check port bindings, and raise errors if fields are unexpectedly set when the mapping only applies to a gateway_mode=routed network. Zero-addresses are allowed, to say the mapping/open-port should be IPv4-only or IPv6-only, and host ports are not allowed. A mapping with no host address, so it applies to IPv4 and IPv6 when the default binding is 0.0.0.0, may include a host port if either uses NAT. The port number is ignored for the directly-routed family. Signed-off-by: Rob Murray <rob.murray@docker.com>
8c2e4ca to
01eecb6
Compare
- What I did
Allocate the same port for IPv4 and IPv6.
0.0.0.0and::in-p 80-p 0.0.0.0:8080-8083:80 -p '[::]:8080-8083:80'-p 0.0.0.0::80 -p '[::]::80'-p 233.252.0.2::80 -p '[2001:DB8::2]::80Add bridge driver option
com.docker.network.bridge.gateway_mode_ipv6=<nat|routed>(andcom.docker.network.bridge.gateway_mode_ipv4=<nat|routed>).routed, no NAT or masquerade iptables rules are set up.-p.inspectoutput, a non-NAT'd port mapping is shown with"HostPort": "".-p(HostConfig.PortBindings) options, a--expose(Config.ExposedPorts) for the container's port is still required to make the-pdo anything. On its own,--exposedoesn't result in the port being opened.-pfor a disabled-NAT network are ignored, the container's port is opened.- How I did it
- How to verify it
New/updated tests.
On a "CentOS 9 (stream)" VM, running firewalld, with addresses
192.168.175.43andfd48:70ba:de62:0:a00:27ff:fece:c637...fd12:3456::/64).firewall-cmd --reload, checked that the rules were re-created.'http://[fd12:3456::2]'andcurl 'http://192.168.175.43:8080'worked as expected.- Description for the changelog