Skip to content

Fixing a couple of network plugin life-cycle mgmt issues#29556

Merged
vieux merged 3 commits intomoby:masterfrom
mavenugo:refcount
Jan 3, 2017
Merged

Fixing a couple of network plugin life-cycle mgmt issues#29556
vieux merged 3 commits intomoby:masterfrom
mavenugo:refcount

Conversation

@mavenugo
Copy link
Contributor

@mavenugo mavenugo commented Dec 19, 2016

Fixes #29447 : Reference count is managed for network plugins using ACQUIRE and RELEASE mechanism in createNetwork and deleteNetwork.

Fixes #29471 : libnetwork remote driver init is modified to lookup active plugins and registered with drvRegistry

Fixes #29473 : Since plugin rm is not fully propagated to drvRegistry, we allowed overriding the drvRegistry map to cleanup stale plugin socket

@mavenugo
Copy link
Contributor Author

cc @anusha-ragunathan @tiborvass

@anusha-ragunathan
Copy link
Contributor

Please include 151d1cf in this PR, since it is refcount related. If you are not inclined to, I can make a separate pr for that.

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan sure. picked up your change as well. With this we will also have docker plugin disable honor the refcount. Thanks. PTAL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest making this logrus.WithError(err).WithFields(logrus.Fields{"mode": mode, "driver", driver}).Error("Error handling plugin refcount operation")

That way the error will not be hidden, and the mode and driver information will be captured in a more structured way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this PR: all caps declarations are unusual in Go and clash with the typical style. I opened #29564 to change this. This PR can be merged first and then I'll rebase afterwards.

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan Inclusion of your patch causes TestPluginVolumeRemoveOnRestart (and possibly others ?) to fail. Hence I backed off your commit. Since your commit is addressing a different issue of docker plugin disable honoring the refcount, it can go in its own PR. Could you please handle that ?

@anusha-ragunathan
Copy link
Contributor

@mavenugo : I will open a new PR for moving refcount from remove to disable.

@anusha-ragunathan
Copy link
Contributor

@mavenugo : #29599 opened. PTAL

@yongtang
Copy link
Member

👍 I built the binary of the PR and verified that #29447 #29471 #29443 have all been addressed. The code change in the pull request looks good to me as well (not a maintainer).

Note: I only tested with network plugin, not IPAM plugin.

@anusha-ragunathan
Copy link
Contributor

DockerSwarmSuite.TestAPIServiceUpdatePort panic

10:16:28 goroutine 27751 [select, 5 minutes]:
10:16:28 net/http.(*persistConn).roundTrip(0xc420ae1100, 0xc420218360, 0x0, 0x0, 0x0)
10:16:28 	/usr/local/go/src/net/http/transport.go:1840 +0x93b
10:16:28 net/http.(*Transport).RoundTrip(0xc420b8c1e0, 0xc420b8c2d0, 0xc420b8c1e0, 0x0, 0x0)
10:16:28 	/usr/local/go/src/net/http/transport.go:380 +0x4ee
10:16:28 net/http.send(0xc420b8c2d0, 0x1bacc60, 0xc420b8c1e0, 0x0, 0x0, 0x0, 0x8, 0xc420495a78, 0x410318)
10:16:28 	/usr/local/go/src/net/http/client.go:256 +0x15f
10:16:28 net/http.(*Client).send(0xc420b83598, 0xc420b8c2d0, 0x0, 0x0, 0x0, 0xc420495a78, 0x0, 0x1)
10:16:28 	/usr/local/go/src/net/http/client.go:146 +0x102
10:16:28 net/http.(*Client).doFollowingRedirects(0xc420b83598, 0xc420b8c2d0, 0x1417908, 0x3, 0x1, 0x0)
10:16:28 	/usr/local/go/src/net/http/client.go:528 +0x5e5
10:16:28 net/http.(*Client).Do(0xc420b83598, 0xc420b8c2d0, 0x131dec6, 0x6, 0x0)
10:16:28 	/usr/local/go/src/net/http/client.go:184 +0x1ea
10:16:28 github.com/docker/docker/integration-cli/daemon.(*Daemon).StartWithLogFile(0xc421ab3e00, 0xc4200272b8, 0xc4219faf40, 0x2, 0x2, 0x0, 0x0)
10:16:28 	/go/src/github.com/docker/docker/integration-cli/daemon/daemon.go:303 +0xd9b
10:16:28 github.com/docker/docker/integration-cli/daemon.(*Daemon).StartWithError(0xc421ab3e00, 0xc4219faf40, 0x2, 0x2, 0x0, 0xc420393800)
10:16:28 	/go/src/github.com/docker/docker/integration-cli/daemon/daemon.go:209 +0x226
10:16:28 github.com/docker/docker/integration-cli/daemon.(*Daemon).Start(0xc421ab3e00, 0x1bb0b20, 0xc420adf1d0, 0xc4219faf40, 0x2, 0x2)
10:16:28 	/go/src/github.com/docker/docker/integration-cli/daemon/daemon.go:196 +0x53
10:16:28 github.com/docker/docker/integration-cli/daemon.(*Daemon).StartWithBusybox(0xc421ab3e00, 0x1bb0b20, 0xc420adf1d0, 0xc4219faf40, 0x2, 0x2)
10:16:28 	/go/src/github.com/docker/docker/integration-cli/daemon/daemon.go:325 +0x61
10:16:28 github.com/docker/docker/integration-cli.(*DockerSwarmSuite).AddDaemon(0xc42038d400, 0xc420adf1d0, 0x1bc0101, 0xc421ae09a8)
10:16:28 	/go/src/github.com/docker/docker/integration-cli/check_test.go:286 +0x255
10:16:28 github.com/docker/docker/integration-cli.(*DockerSwarmSuite).TestAPIServiceUpdatePort(0xc42038d400, 0xc420adf1d0)
10:16:28 	/go/src/github.com/docker/docker/integration-cli/docker_api_service_update_test.go:22 +0x57
10:16:28 reflect.Value.call(0x130ba80, 0xc42038d400, 0x1613, 0x131be85, 0x4, 0xc420b83f30, 0x1, 0x1, 0x1bff9e0, 0x1300020, ...)
10:16:28 	/usr/local/go/src/reflect/value.go:434 +0x5c8
10:16:28 reflect.Value.Call(0x130ba80, 0xc42038d400, 0x1613, 0xc420b83f30, 0x1, 0x1, 0xc420aabef0, 0xc420a3decf, 0x0)
10:16:28 	/usr/local/go/src/reflect/value.go:302 +0xa4
10:16:28 github.com/docker/docker/vendor/github.com/go-check/check.(*suiteRunner).forkTest.func1(0xc420adf1d0)
10:16:28 	/go/src/github.com/docker/docker/vendor/github.com/go-check/check/check.go:816 +0x6b8
10:16:28 github.com/docker/docker/vendor/github.com/go-check/check.(*suiteRunner).forkCall.func1(0xc420c59b90, 0xc420adf1d0, 0xc420218140)
10:16:28 	/go/src/github.com/docker/docker/vendor/github.com/go-check/check/check.go:672 +0x7c
10:16:28 created by github.com/docker/docker/vendor/github.com/go-check/check.(*suiteRunner).forkCall
10:16:28 	/go/src/github.com/docker/docker/vendor/github.com/go-check/check/check.go:673 +0x251

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan the test is timing out waiting for AddDaemon call to return. This doesnt seem related to this change, but we never know. I will investigate further.
@vdemeester do you know what might be going on ?

@anusha-ragunathan
Copy link
Contributor

Daemon fails to start.

10:16:29 goroutine 27753 [syscall, 5 minutes]:
10:16:29 syscall.Syscall6(0xf7, 0x1, 0xb44, 0xc420c37100, 0x1000004, 0x0, 0x0, 0x7f37a835a8c0, 0xc4219fa020, 0x20)
10:16:29 	/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
10:16:29 os.(*Process).blockUntilWaitable(0xc421972810, 0xc42001a001, 0xc420a39e80, 0x43d48c)
10:16:29 	/usr/local/go/src/os/wait_waitid.go:28 +0xbc
10:16:29 os.(*Process).wait(0xc421972810, 0x130c9c0, 0xc420431310, 0xe13)
10:16:29 	/usr/local/go/src/os/exec_unix.go:22 +0xab
10:16:29 os.(*Process).Wait(0xc421972810, 0x1, 0x1, 0x86266f)
10:16:29 	/usr/local/go/src/os/doc.go:49 +0x2b
10:16:29 os/exec.(*Cmd).Wait(0xc420c31760, 0x16, 0xc420a39f98)   ---> daemon fails to start. 
10:16:29 	/usr/local/go/src/os/exec/exec.go:434 +0x6d
10:16:29 github.com/docker/docker/integration-cli/daemon.(*Daemon).StartWithLogFile.func1(0xc420b7c4e0, 0xc421ab3e00)
10:16:29 	/go/src/github.com/docker/docker/integration-cli/daemon/daemon.go:268 +0x36
10:16:29 created by github.com/docker/docker/integration-cli/daemon.(*Daemon).StartWithLogFile
10:16:29 	/go/src/github.com/docker/docker/integration-cli/daemon/daemon.go:271 +0xa2e

@tiborvass tiborvass added this to the 1.13.0 milestone Dec 20, 2016
@anusha-ragunathan
Copy link
Contributor

The panic is the same as #27287 (comment). GetAll on pluginv1 in swarm mode is buggy. Need to understand why.

@anusha-ragunathan
Copy link
Contributor

Confirmed that GetAll is blocking.

sync.runtime_notifyListWait(0xc420371990, 0x0)
        /usr/local/go/src/runtime/sema.go:267 +0x122
sync.(*Cond).Wait(0xc420371980)
        /usr/local/go/src/sync/cond.go:57 +0x80
github.com/docker/docker/pkg/plugins.(*Plugin).waitActive(0xc42039f4a0, 0xc42020c6b0, 0x4115b5)
        /go/src/github.com/docker/docker/pkg/plugins/plugins.go:159 +0x55
github.com/docker/docker/pkg/plugins.(*Plugin).implements(0xc42039f4a0, 0x1981b12, 0xa, 0x1)
        /go/src/github.com/docker/docker/pkg/plugins/plugins.go:166 +0x2f
github.com/docker/docker/pkg/plugins.GetAll(0x1981b12, 0xa, 0xa, 0xc4206f1920, 0x0, 0x1, 0x7fb99267d000)
        /go/src/github.com/docker/docker/pkg/plugins/plugins.go:295 +0x392
github.com/docker/docker/plugin/store.(*Store).GetAllByCap(0xc4202c8580, 0x1981b12, 0xa, 0x197cf95, 0x7, 0x2508c20, 0xc42044c000, 0xc4206f1904)
        /go/src/github.com/docker/docker/plugin/store/store.go:227 +0x9e
github.com/docker/docker/vendor/github.com/docker/libnetwork/ipams/remote.Init(0x24fb0a0, 0xc42021f840, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/ipams/remote/remote.go:53 +0x101
github.com/docker/docker/vendor/github.com/docker/libnetwork.initIPAMDrivers(0xc42021f840, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers_ipam.go:17 +0xcd
github.com/docker/docker/vendor/github.com/docker/libnetwork.New(0xc42021f740, 0x8, 0x8, 0xc4202c8580, 0xc4201ce9c0, 0xc42021f740, 0x8)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/controller.go:208 +0x6e0
github.com/docker/docker/daemon.(*Daemon).initNetworkController(0xc4200183c0, 0xc42042b000, 0xc4201ce9c0, 0x0, 0xc4201ce9c0, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/daemon_unix.go:661 +0xa7
github.com/docker/docker/daemon.(*Daemon).restore(0xc4200183c0, 0x24ea420, 0xc4200183c0)
        /go/src/github.com/docker/docker/daemon/daemon.go:272 +0x7e7
github.com/docker/docker/daemon.NewDaemon(0xc42042b000, 0x2508aa0, 0xc4204cd720, 0x24f9c60, 0xc420065520, 0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/daemon.go:678 +0x21bc
main.(*DaemonCli).start(0xc4204037d0, 0x0, 0x199ad21, 0x17, 0xc42042b000, 0xc42036a7d0, 0xc4203f2870, 0x0, 0x0)
        /go/src/github.com/docker/docker/cmd/dockerd/daemon.go:264 +0x10e8
main.runDaemon(0x0, 0x199ad21, 0x17, 0xc42042b000, 0xc42036a7d0, 0xc4203f2870, 0x10, 0x0)
        /go/src/github.com/docker/docker/cmd/dockerd/docker.go:86 +0xb2
main.newDaemonCommand.func1(0xc420344fc0, 0xc420456a00, 0x0, 0x10, 0x0, 0x0)
        /go/src/github.com/docker/docker/cmd/dockerd/docker.go:42 +0x71
github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute(0xc420344fc0, 0xc42000c370, 0x10, 0x11, 0xc420344fc0, 0xc42000c370)
        /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:646 +0x26d
github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420344fc0, 0x178a640, 0xc42036a701, 0xc4204cd1d0)
        /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:742 +0x377
github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420344fc0, 0xc4204cd1d0, 0xc4200160b8)
        /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:695 +0x2b
main.main()
        /go/src/github.com/docker/docker/cmd/dockerd/docker.go:106 +0xe2

@anusha-ragunathan
Copy link
Contributor

@mavenugo : Rootcaused the issue. Scan was scanning left over directories from previous tests. So it was waiting on activations from plugins that were not running anymore. Goes to show that we are not cleaning up properly after using pluginv1.

I've fixed it by cleaning up the Scan directories before calling the test. This is a pointed fix. The long term fix would be to add this in the TearDownTest and TearDownSuite sections of both DockerSuite and DockerSwarmSuite.

Fix available at
15c3e82

@mavenugo
Copy link
Contributor Author

mavenugo commented Dec 21, 2016

@anusha-ragunathan I agree. We need a better fix in the test infra to cleanup stale entries. I picked up your commit and started the build again. Lets see if this fixes the test issue.

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan same failures now seen on other plugin tests. I guess we have to fix this in a more generic fashion rather than a point fix at this point :(
alternatively, we can introduce a V2 only GetAllByCap function (or a bool to prevent v1 plugins to be scanned). For the case that we are trying to address here, we really dont care about the v1 plugins.

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan I added a temporary patch : 4b710fe to skip plugin-v1 from being scanned. There is only 1 place the GetAllByCap is used today and hence this is the least impactful change for now. Once the tests are done, we can decide on what can be done...

@mavenugo
Copy link
Contributor Author

Just as a data-point, skipping plugin-v1 scan using 4b710fe solves the problem. It may not be an ideal fix. But atleast we know that the test-case is failing because of scanning (and failing due to the absence of) v1-plugin

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan another attempt and this time cleaning up a v1-plugin immediately after use in one of the swarm network tests (1024517). If this works, then we are good.

@anusha-ragunathan
Copy link
Contributor

The approach of getting v2 plugins only in this case, sounds okay to me. Design LGTM.

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan updated with the new vendoring changes. fingers crossed on IT :)

@mavenugo
Copy link
Contributor Author

The CI failures in TestPluginInstallImage are unrelated and is failing for multiple PRs now.

@mavenugo
Copy link
Contributor Author

@anusha-ragunathan @tonistiigi @tiborvass could you PTAL ? the CI issues are unrelated.

@tonistiigi
Copy link
Member

SGTM

@mavenugo Needs rebase. Use tonistiigi/test-docker-netplugin for the test.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Signed-off-by: Madhu Venugopal <madhu@docker.com>
@mavenugo
Copy link
Contributor Author

@tonistiigi done. Its green now !

@tiborvass
Copy link
Contributor

LGTM

@mavenugo would you mind opening this PR against 1.13 branch as well ? Not sure about the vendoring part.

@mavenugo
Copy link
Contributor Author

mavenugo commented Jan 3, 2017

@tiborvass yes. I will once this is merged in the master.

@vieux
Copy link
Contributor

vieux commented Jan 3, 2017

LGTM

@vieux vieux merged commit 2ef6d80 into moby:master Jan 3, 2017
@vieux
Copy link
Contributor

vieux commented Jan 3, 2017

@mavenugo done

@mavenugo
Copy link
Contributor Author

mavenugo commented Jan 3, 2017

@vieux @tiborvass cherry-picked : #29858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants