Fix various data races in pkg/aws/eni and pkg/ipam#11685
Merged
aanm merged 7 commits intocilium:masterfrom May 27, 2020
Merged
Conversation
|
Please set the appropriate release note label. |
Member
Author
Member
Author
|
retest-4.9 |
Member
Author
|
retest-4.19 |
Member
Author
|
Travis has passed thrice so far... Edit: 8 times... |
aanm
requested changes
May 27, 2020
Comment on lines
293
to
313
Member
There was a problem hiding this comment.
It's odd seeing a private method getting a mutext. Please add a comment in the function stating n.mutex must NOT be held.
Comment on lines
207
to
209
Member
There was a problem hiding this comment.
please use a defer
m.mutex.RLock()
defer m.mutex.RUnlock()
m.instances.ForeachInterface(instanceID, fn)
Member
|
@christarazi my requested changes won't require the CI to re-run again. Ping me once they are addressed so we can merge the PR without running the CI. |
This fixes data races found when running the unit-tests with `-race`.
The mutex is to protect access to `n.enis`.
```
WARNING: DATA RACE
Write at 0x00c0007b40e0 by goroutine 135:
github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:430 +0xfa
github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb
github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88
Previous read at 0x00c0007b40e0 by goroutine 46:
github.com/cilium/cilium/pkg/aws/eni.(*Node).findNextIndex()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:292 +0x86
github.com/cilium/cilium/pkg/aws/eni.(*Node).CreateInterface()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:339 +0x584
github.com/cilium/cilium/pkg/ipam.(*Node).createInterface()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:435 +0x290
github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:628 +0x85d
github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Goroutine 135 (running) created at:
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Goroutine 46 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
```
```
WARNING: DATA RACE
Write at 0x00c000110060 by goroutine 94:
github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:439 +0xfa
github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb
github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88
Previous read at 0x00c000110060 by goroutine 92:
github.com/cilium/cilium/pkg/aws/eni.(*Node).PrepareIPAllocation()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:211 +0xefc
github.com/cilium/cilium/pkg/ipam.(*Node).determineMaintenanceAction()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:542 +0x184
github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:578 +0x53
github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Goroutine 94 (running) created at:
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Goroutine 92 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit mostly reworks `eni.(*Node).ResyncInterfacesAndIPs()` as it
makes many unprotected accesses to shared resources. This change is to
reduce the scope of the mutexes to only when a shared resource is
accessed. It also refactors that code such that only the immediate
structure is responsible for locking / unlocking its mutex. This allows
a mixtures of read and write locks intended to minimize the impact on
performance as much as possible. This change also attempts to minimize
the nesting of mutex locks as much as possible, which can lead to
deadlocking or long hanging periods.
Not all the data race output from Golang is provided in this commit body
as there were many small, nested data races that were fixed along the
way. The main, important data race outputs are provided below.
All data races were found by running `-race` when running `make
unit-tests`.
```
WARNING: DATA RACE
Read at 0x00c00017c388 by goroutine 86:
github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:433 +0x1ca
github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb
github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:669 +0xc1
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Previous write at 0x00c00017c388 by goroutine 82:
github.com/cilium/cilium/pkg/aws/eni.(*InstancesManager).Resync()
/home/chris/code/cilium/cilium/pkg/aws/eni/instances.go:185 +0xf24
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:589 +0xb98
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Goroutine 86 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
```
```
WARNING: DATA RACE
Write at 0x00c0001f3738 by goroutine 35:
github.com/cilium/cilium/pkg/aws/eni.(*InstancesManager).Resync()
/home/chris/code/cilium/cilium/pkg/aws/eni/instances.go:185 +0xf24
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:167 +0x8f
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Previous read at 0x00c0001f3738 by goroutine 39:
github.com/cilium/cilium/pkg/aws/eni.(*Node).PopulateStatusFields()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:84 +0x11e
github.com/cilium/cilium/pkg/ipam.(*Node).syncToAPIServer()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:701 +0x1ce
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func2()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:256 +0x41
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Goroutine 35 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.NewNodeManager()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:162 +0x34f
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestGetNodeNames()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:71 +0x42f
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Goroutine 39 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:251 +0x938
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestGetNodeNames()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:91 +0x1411
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
```
```
WARNING: DATA RACE
Write at 0x00c0007981f8 by goroutine 81:
github.com/cilium/cilium/pkg/aws/eni.(*InstancesManager).Resync()
/home/chris/code/cilium/cilium/pkg/aws/eni/instances.go:185 +0xf24
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:167 +0x8f
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Previous read at 0x00c0007981f8 by goroutine 152:
github.com/cilium/cilium/pkg/aws/eni.(*Node).getSecurityGroupIDs()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:248 +0x1f2
github.com/cilium/cilium/pkg/aws/eni.(*Node).CreateInterface()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:332 +0x275
github.com/cilium/cilium/pkg/ipam.(*Node).createInterface()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:435 +0x290
github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:628 +0x85d
github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Goroutine 81 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.NewNodeManager()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:162 +0x34f
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:578 +0x62d
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Goroutine 152 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This data race occurs in SetRunningLocked() which has a comment stating
that it assumes that `Node` has been locked. That is not the case, and
this PR fixes that incorrect assumption by renaming the function to
SetRunning() and locking the mutex when writing to the shared resource.
This fixes a data race found when running the unit-tests with `-race`.
```
WARNING: DATA RACE
Write at 0x00c0002d1030 by goroutine 79:
github.com/cilium/cilium/pkg/ipam.(*Node).SetRunningLocked()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:133 +0xc7
github.com/cilium/cilium/pkg/aws/eni.(*Node).errorInstanceNotRunning()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:274 +0xfe
github.com/cilium/cilium/pkg/aws/eni.(*Node).CreateInterface()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:385 +0xe9f
github.com/cilium/cilium/pkg/ipam.(*Node).createInterface()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:435 +0x290
github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:628 +0x85d
github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
Previous read at 0x00c0002d1030 by goroutine 77:
github.com/cilium/cilium/pkg/ipam.(*Node).IsRunning()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:125 +0x85
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerInstanceNotRunning.func1()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:655 +0x6d
github.com/cilium/cilium/pkg/testutils.WaitUntil()
/home/chris/code/cilium/cilium/pkg/testutils/condition.go:36 +0x97
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerInstanceNotRunning()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:653 +0xf22
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Goroutine 79 (running) created at:
github.com/cilium/cilium/pkg/trigger.NewTrigger()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerInstanceNotRunning()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:650 +0xed8
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Goroutine 77 (running) created at:
gopkg.in/check%2ev1.(*suiteRunner).forkCall()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d
gopkg.in/check%2ev1.(*suiteRunner).forkTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:763 +0x1b9
gopkg.in/check%2ev1.(*suiteRunner).runTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:818 +0x228
gopkg.in/check%2ev1.(*suiteRunner).run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:624 +0x1c7
gopkg.in/check%2ev1.Run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:92 +0x5a
gopkg.in/check%2ev1.RunAll()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:84 +0x127
gopkg.in/check%2ev1.TestingT()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:72 +0x5f6
github.com/cilium/cilium/pkg/aws/eni.Test()
/home/chris/code/cilium/cilium/pkg/aws/eni/eni_test.go:26 +0x38
testing.tRunner()
/usr/lib/go/src/testing/testing.go:991 +0x1eb
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This fixes a data race found when running the unit-tests with `-race`.
```
WARNING: DATA RACE
Write at 0x00c000067168 by goroutine 118:
github.com/cilium/cilium/pkg/aws/eni.(*Node).UpdatedNode()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:69 +0x5f
github.com/cilium/cilium/pkg/ipam.(*Node).UpdatedResource()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:319 +0x81
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:273 +0x14a
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerReleaseAddress()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:467 +0x101d
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Previous read at 0x00c000067168 by goroutine 324:
github.com/cilium/cilium/pkg/aws/eni.(*Node).getLimits()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:101 +0x5a
github.com/cilium/cilium/pkg/aws/eni.(*Node).GetMaximumAllocatableIPv4()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:488 +0x2e2
github.com/cilium/cilium/pkg/ipam.(*Node).getMaxAllocate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:213 +0x79
github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:368 +0x42a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88
Goroutine 118 (running) created at:
gopkg.in/check%2ev1.(*suiteRunner).forkCall()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d
gopkg.in/check%2ev1.(*suiteRunner).forkTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:763 +0x1b9
gopkg.in/check%2ev1.(*suiteRunner).runTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:818 +0x228
gopkg.in/check%2ev1.(*suiteRunner).run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:624 +0x1c7
gopkg.in/check%2ev1.Run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:92 +0x5a
gopkg.in/check%2ev1.RunAll()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:84 +0x127
gopkg.in/check%2ev1.TestingT()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:72 +0x5f6
github.com/cilium/cilium/pkg/aws/eni.Test()
/home/chris/code/cilium/cilium/pkg/aws/eni/eni_test.go:26 +0x38
testing.tRunner()
/usr/lib/go/src/testing/testing.go:991 +0x1eb
Goroutine 324 (finished) created at:
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This fixes a data race found when running the unit-tests with `-race`.
```
WARNING: DATA RACE
Write at 0x00c00288e358 by goroutine 256:
github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:448 +0x8e
github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb
github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:347 +0x7a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:394 +0x88
Previous read at 0x00c00288e358 by goroutine 416:
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerReleaseAddress()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:498 +0x1d71
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Goroutine 256 (running) created at:
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:393 +0x2cf
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4d1
Goroutine 416 (running) created at:
gopkg.in/check%2ev1.(*suiteRunner).forkCall()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d
gopkg.in/check%2ev1.(*suiteRunner).forkTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:763 +0x1b9
gopkg.in/check%2ev1.(*suiteRunner).runTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:818 +0x228
gopkg.in/check%2ev1.(*suiteRunner).run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:624 +0x1c7
gopkg.in/check%2ev1.Run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:92 +0x5a
gopkg.in/check%2ev1.RunAll()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:84 +0x127
gopkg.in/check%2ev1.TestingT()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:72 +0x5f6
github.com/cilium/cilium/pkg/aws/eni.Test()
/home/chris/code/cilium/cilium/pkg/aws/eni/eni_test.go:26 +0x38
testing.tRunner()
/usr/lib/go/src/testing/testing.go:991 +0x1eb
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit creates a locked version of `eni.(*Node).getLimits()`
because `eni.(*Node).GetMaximumAllocatableIPv4()` locked the mutex at
the start of the function. This prevents a recursive lock and a
potential deadlock.
This fixes a data race found when running the unit-tests with `-race`.
```
RNING: DATA RACE
Write at 0x00c00075c180 by goroutine 33:
github.com/cilium/cilium/pkg/aws/eni.(*Node).UpdatedNode()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:71 +0x85
github.com/cilium/cilium/pkg/ipam.(*Node).UpdatedResource()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:319 +0x81
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:278 +0x149
github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerDefaultAllocation()
/home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:271 +0xff1
runtime.call32()
/usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a
reflect.Value.Call()
/usr/lib/go/src/reflect/value.go:321 +0xd3
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9
Previous read at 0x00c00075c180 by goroutine 77:
github.com/cilium/cilium/pkg/aws/eni.(*Node).GetMaximumAllocatableIPv4()
/home/chris/code/cilium/cilium/pkg/aws/eni/node.go:499 +0x7e8
github.com/cilium/cilium/pkg/ipam.(*Node).getMaxAllocate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:213 +0x79
github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
/home/chris/code/cilium/cilium/pkg/ipam/node.go:368 +0x42a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:347 +0x7a
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:394 +0x88
Goroutine 33 (running) created at:
gopkg.in/check%2ev1.(*suiteRunner).forkCall()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d
gopkg.in/check%2ev1.(*suiteRunner).forkTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:763 +0x1b9
gopkg.in/check%2ev1.(*suiteRunner).runTest()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:818 +0x228
gopkg.in/check%2ev1.(*suiteRunner).run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:624 +0x1c7
gopkg.in/check%2ev1.Run()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:92 +0x5a
gopkg.in/check%2ev1.RunAll()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:84 +0x127
gopkg.in/check%2ev1.TestingT()
/home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:72 +0x5f6
github.com/cilium/cilium/pkg/aws/eni.Test()
/home/chris/code/cilium/cilium/pkg/aws/eni/eni_test.go:26 +0x38
testing.tRunner()
/usr/lib/go/src/testing/testing.go:991 +0x1eb
Goroutine 77 (running) created at:
github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:393 +0x2cf
github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
/home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101
github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
/home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4d1
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
In this test suite, the `metricsapi` is global variable which is shared
among all the `Test*` functions. It is possible that it becomes polluted
over time during test execution.
This is an attempt to resolve the following:
```
FAIL: node_manager_test.go:563: ENISuite.TestNodeManagerManyNodes
node_manager_test.go:602:
c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs)
... Error: Node node53 allocation mismatch. expected: 10 allocated: 18
node_manager_test.go:602:
c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs)
... Error: Node node59 allocation mismatch. expected: 10 allocated: 18
node_manager_test.go:617:
c.Assert(metricsapi.AllocatedIPs("available"), check.Equals, numNodes*minAllocate)
... obtained int = 1016
... expected int = 1000
...
OOPS: 17 passed, 1 FAILED
--- FAIL: Test (2.60s)
FAIL
coverage: 4.3% of statements in ./...
FAIL github.com/cilium/cilium/pkg/aws/eni 2.684s
FAIL
Makefile:204: recipe for target 'unit-tests' failed
make: *** [unit-tests] Error 1
The command "./.travis/build.sh" exited with 2.
```
Signed-off-by: Chris Tarazi <chris@isovalent.com>
9d72371 to
270b597
Compare
Member
Author
|
@aanm Made the requested modifications. |
aanm
approved these changes
May 27, 2020
This was referenced May 27, 2020
sayboras
added a commit
to sayboras/cilium
that referenced
this pull request
Jun 1, 2020
This commit is to fix the recent changes after rebased. Related PR is cilium#11685. Basically, there was a check if Node is nil after calling mutex lock, which will cause nil pointer dereference. I just refactor the code to make sure underlying node is not nil before obtaining mutex lock. Relates to 5259ff7 Signed-off-by: Tam Mach <sayboras@yahoo.com>
aanm
pushed a commit
that referenced
this pull request
Jun 2, 2020
This commit is to fix the recent changes after rebased. Related PR is #11685. Basically, there was a check if Node is nil after calling mutex lock, which will cause nil pointer dereference. I just refactor the code to make sure underlying node is not nil before obtaining mutex lock. Relates to 5259ff7 Signed-off-by: Tam Mach <sayboras@yahoo.com>
tklauser
pushed a commit
that referenced
this pull request
Jun 3, 2020
[ upstream commit e8a7de6 ] This commit is to fix the recent changes after rebased. Related PR is #11685. Basically, there was a check if Node is nil after calling mutex lock, which will cause nil pointer dereference. I just refactor the code to make sure underlying node is not nil before obtaining mutex lock. Relates to 5259ff7 Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
aanm
pushed a commit
that referenced
this pull request
Jun 4, 2020
[ upstream commit e8a7de6 ] This commit is to fix the recent changes after rebased. Related PR is #11685. Basically, there was a check if Node is nil after calling mutex lock, which will cause nil pointer dereference. I just refactor the code to make sure underlying node is not nil before obtaining mutex lock. Relates to 5259ff7 Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR also contains a potential fix for the flakiness seen in
#11560. The commit for the fix adds a
setup and cleanup method for a metrics mock implementation used in the
ENISuite.
See commit msgs.
Related #11560
Related #10677