Skip to content

Add local address autodetection on swarm init#28221

Merged
aaronlehmann merged 1 commit intomoby:masterfrom
aboch:ala
Nov 10, 2016
Merged

Add local address autodetection on swarm init#28221
aaronlehmann merged 1 commit intomoby:masterfrom
aboch:ala

Conversation

@aboch
Copy link
Contributor

@aboch aboch commented Nov 9, 2016

- What I did
Added local address auto detection during swarm init when --advertise-addr is a non local address.
It is the same best-effort auto detection logic which is being used when advertise address is not specified.

- How to verify it
Form a cluster on AWS with
swarm init --advertise-addr <this VM public IP> and
swarm join ... --advertise-addr <this VM public IP>

Create a service over an encrypted overlay network and make sure tasks can talk to other nodes' tasks. Run a container on the attachable network and make sure it can access the service.

Fixes #27647

- A picture of a cute animal (not mandatory but encouraged)
840756455839-cute-wallpapers-animals-nature-cat-animal-pictures-garden

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to override listenHost like this? If the user asked to bind to 0.0.0.0, shouldn't we still do so? I would think only localAddr needs to be overridden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is, because if the the user asked to bind to 0.0.0.0 explicitly or did not specify --listen-addr (in both cases listenHost==0.0.0.0 so true == net.ParseIP(localAddr).IsUnspecified()), we would fail the swarm init (because of the adv addr locality).

If listenHost==0.0.0.0, and --advertise-addr is not local, we currently ask the user to specify a --listen-addr, in other words a listenHost. Therefore here we are sparing the user from specifying a --listen-addr when we can find one by ourselves.

It is true we could just take care of the local address, and my first implementation was only doing that, but I felt it was safer, more predictable to fill in what currently we ask the user to fill, which is the --listen-addr, and keep current logic where the local address is the listen address (when available).

So that the addresses libnetwork receive are consistent either when --listen-addr is specified by user or when we need user to specify the --listen-addr but we can do it ourselves for ease of use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the logic but it feels wrong to me to bind to an automatically detected IP even if the user explicitly passed --listen-addr 0.0.0.0:<port>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand. But even today user is not allowed to pass --listen-addr 0.0.0.0:<port> and --advertise-addr=<non local address> combination.
The change takes effect only when adv addr is non local and we need a specific local address for the ipsec tunnels.

I probably don't see it, but if that seems the proper change to be done, I can take care of filling in only the local address. Let me know (only thing I will only be able to take care of it late tonight only, at the park now). Thanks.

@aaronlehmann
Copy link

Nice! Thanks for working on this.

@vieux vieux added this to the 1.13.0 milestone Nov 9, 2016
@vieux
Copy link
Contributor

vieux commented Nov 10, 2016

ping @aboch @mrjana @mavenugo

@aboch aboch changed the title Add listen address autodetection on swarm init Add local address autodetection on swarm init Nov 10, 2016
@aboch
Copy link
Contributor Author

aboch commented Nov 10, 2016

@aaronlehmann Changed to only set the local address. Tested it on aws. PTAL when you get a chance.

@aaronlehmann
Copy link

LGTM

@mavenugo
Copy link
Contributor

LGTM

@aboch is it possible to write a integration-test for this ?

@aboch
Copy link
Contributor Author

aboch commented Nov 10, 2016

@mavenugo It is a bit tricky, because in order to test this scenario I need traffic to flow via two docker hosts (whatever their abstraction is in integ framework) with 1 to 1 NAT between their internal address and external address. So that I can create a 2 tasks service or attach two containers so that the ipsec tunnels are programmed, then finally run traffic among containers.

I would prefer this functionality to go in in time for 1.13 and investigate later if/how a meaningful test for this can be done.

@mavenugo
Copy link
Contributor

@aboch okay. Is it possible to add a test without checking for the actual data-path ? I mean, we could do docker swarm init --advertise-addr={ext-ip} and it must succeed without the need for --listen-addr purely from an UX standpoint at the minimum ?
(I dont think there is any API / UX to check for the chosen --listen-addr :( )...

@aboch
Copy link
Contributor Author

aboch commented Nov 10, 2016

@mavenugo Oh, I see, just to test the swarm init go through. That should be possible. Let me take a look.

@aboch
Copy link
Contributor Author

aboch commented Nov 10, 2016

@mavenugo Added the test. Verified it does not pass on master. Thanks.

Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aboch
Copy link
Contributor Author

aboch commented Nov 10, 2016

When I run the integ test locally it works, it is able to detect a system address.

sudo TESTFLAGS="-check.f DockerSwarmSuite.TestSwarmInitExtAdvAddress" make -d test-integration-cli
...
INFO: Testing against a local daemon
PASS: docker_cli_swarm_test.go:89: DockerSwarmSuite.TestSwarmInitExtAdvAddress  2.218s
OK: 1 passed
PASS

But it fails on jenkins:

19:21:32 FAIL: docker_cli_swarm_test.go:89: DockerSwarmSuite.TestSwarmInitExtAdvAddress
19:21:32 
19:21:32 [d1e047f1f3cf9] waiting for daemon to start
19:21:32 [d1e047f1f3cf9] daemon started
19:21:32 docker_cli_swarm_test.go:93:
19:21:32     c.Assert(err, checker.IsNil, check.Commentf("out: %v", out))
19:21:32 ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc420569e00), Stderr:[]uint8(nil)} ("exit status 1")
19:21:32 ... out: Error response from daemon: must specify a listening address because the address to advertise is not recognized as a system address, and a system's IP address to use could not be uniquely identified

@aboch
Copy link
Contributor Author

aboch commented Nov 10, 2016

It looks like a leaked interface from a previous test is causing this (thanks @tonistiigi for leading me to the logs):

time="2016-11-10T19:19:16.821920630Z" level=debug msg="form data: {\"AdvertiseAddr\":\"192.168.250.250\",\"AutoLockManagers\":false,\"ForceNewCluster\":false,\"ListenAddr\":\"0.0.0.0:2377\",\"Spec\":{\"CAConfig\":{},\"Dispatcher\":{},\"EncryptionConfig\":{\"AutoLockManagers\":false},\"Orchestration\":{},\"Raft\":{\"ElectionTick\":0,\"HeartbeatTick\":0},\"TaskDefaults\":{}}}" 
time="2016-11-10T19:19:16.822767149Z" level=warning msg="Could not find a local address: could not choose an IP address to advertise since this system has multiple addresses on different interfaces (172.17.0.2 on eth0 and 172.18.0.1 on br-8798ff8194c8)" 
time="2016-11-10T19:19:16.822787214Z" level=error msg="Error initializing swarm: must specify a listening address because the address to advertise is not recognized as a system address, and a system's IP address to use could not be uniquely identified" 
time="2016-11-10T19:19:16.822798134Z" level=error msg="Handler for POST /v1.25/swarm/init returned error: must specify a listening address because the address to advertise is not recognized as a system address, and a system's IP address to use could not be uniquely identified" 

Given the test is not really adding much, I am suggesting to remove it for now.
I will add it later once I find the test which is not removing the bridge interface.

- when advertise-addr is not local and listen-addr is
  not specified

Signed-off-by: Alessandro Boch <aboch@docker.com>
@mavenugo
Copy link
Contributor

@aboch yes. agreed. pls do so.

@aaronlehmann aaronlehmann merged commit 5d1feb5 into moby:master Nov 10, 2016
@aboch aboch deleted the ala branch November 8, 2017 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants