Skip to content

test(datarace): Fix datarace issue in spanstat.go#11615

Merged
borkmann merged 1 commit intocilium:masterfrom
sayboras:bugfix/span-unsafe
Jun 3, 2020
Merged

test(datarace): Fix datarace issue in spanstat.go#11615
borkmann merged 1 commit intocilium:masterfrom
sayboras:bugfix/span-unsafe

Conversation

@sayboras
Copy link
Copy Markdown
Member

As mentioned in #10991, successDuration was not safely access from
eventqueue

Add RWLock to make this struct thread-safe

Closes #10991

Signed-off-by: Tam Mach sayboras@yahoo.com

Fix datarace issue in spanstat.go

@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

@sayboras sayboras force-pushed the bugfix/span-unsafe branch 4 times, most recently from dd36ee4 to 7f791c7 Compare May 20, 2020 13:18
@joestringer
Copy link
Copy Markdown
Member

Hi @sayboras , do you need help or feedback on this issue/PR? It is currently marked as a release blocker for v1.8 so we will want to determine whether it's vital to merge for v1.8 and if so, help to get it merged.

@sayboras sayboras force-pushed the bugfix/span-unsafe branch 2 times, most recently from 267eb27 to 8aa4d50 Compare June 2, 2020 09:49
@sayboras
Copy link
Copy Markdown
Member Author

sayboras commented Jun 2, 2020

do you need help or feedback on this issue/PR? It is currently marked as a release blocker for v1.8 so we will want to determine whether it's vital to merge for v1.8 and if so, help to get it merged.

@joestringer hi, if you can help on this. I added lock for SpanStat struct, however, was having below copylocks issue.

To resolve this, I need to use pointer (e.g. lock *lock.RWMutex), but then the drawback is to have constructor to initialize values. This must be done for all instances of SpanStat struct.

Looking forward to hearing your input. Thanks

copylocks issue
$ golangci-lint run ./... --disable-all -E govet 
daemon/cmd/fqdn.go:408:164: copylocks: notifyOnDNSMsg passes lock by value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
func (d *Daemon) notifyOnDNSMsg(lookupTime time.Time, ep *endpoint.Endpoint, epIPPort string, serverAddr string, msg *dns.Msg, protocol string, allowed bool, stat dnsproxy.ProxyRequestContext) error {
                                                                                                                                                                   ^
pkg/endpoint/policy.go:215:33: copylocks: literal copies lock value from stats.waitingForIdentityCache: github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                "waitingForIdentityCache":    stats.waitingForIdentityCache,
                                              ^
pkg/endpoint/policy.go:216:33: copylocks: literal copies lock value from stats.waitingForPolicyRepository: github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                "waitingForPolicyRepository": stats.waitingForPolicyRepository,
                                              ^
pkg/endpoint/policy.go:217:33: copylocks: literal copies lock value from stats.policyCalculation: github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                "policyCalculation":          stats.policyCalculation,
                                              ^
pkg/fqdn/dnsproxy/proxy.go:373:77: copylocks: call of p.NotifyOnDNSMsg copies lock value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                p.NotifyOnDNSMsg(time.Now(), nil, epIPPort, "", request, protocol, false, stat)
                                                                                          ^
pkg/fqdn/dnsproxy/proxy.go:382:77: copylocks: call of p.NotifyOnDNSMsg copies lock value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                p.NotifyOnDNSMsg(time.Now(), nil, epIPPort, "", request, protocol, false, stat)
                                                                                          ^
pkg/fqdn/dnsproxy/proxy.go:394:90: copylocks: call of p.NotifyOnDNSMsg copies lock value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                p.NotifyOnDNSMsg(time.Now(), ep, epIPPort, targetServerAddr, request, protocol, false, stat)

@sayboras sayboras force-pushed the bugfix/span-unsafe branch from 8aa4d50 to 95c3b39 Compare June 2, 2020 09:54
@sayboras sayboras marked this pull request as ready for review June 2, 2020 10:13
@sayboras sayboras requested a review from a team June 2, 2020 10:13
@sayboras sayboras requested review from a team as code owners June 2, 2020 10:13
Copy link
Copy Markdown
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small non-blocking nit

Comment thread pkg/spanstat/spanstat_test.go Outdated
@aanm aanm added kind/bug This is a bug in the Cilium logic. needs-backport/1.8 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 2, 2020
@sayboras sayboras force-pushed the bugfix/span-unsafe branch from 643502b to 58b8fbd Compare June 2, 2020 11:58
@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Jun 2, 2020

test-me-please

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 2, 2020

Coverage Status

Coverage increased (+0.02%) to 36.922% when pulling db6872a on sayboras:bugfix/span-unsafe into 3da5bcf on cilium:master.

Copy link
Copy Markdown
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Copy Markdown
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, passing stats by reference everywhere makes sense and is a reasonable way to resolve the govet issue.

It'd be good to squash the two commits together just to ensure that we won't hit any issues in the build process in future in case we need to bisect the tree across this commit.

Comment thread pkg/spanstat/spanstat.go Outdated
Copy link
Copy Markdown
Member

@joestringer joestringer Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, If you compose the SpanStat from lock.RWMutex directly (ie no field name), then these can be s.Lock() / defer s.Unlock() and you don't have to worry about whether to call it mutex or lock or so on :-)

FWIW we aren't consistently composing objects from locks this way across the codebase so I'm not fussed about which approach we take here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's good to know. Composing really looks code look nicer. To make it easier for other reviewers, I will just leave as it is now. Will do it in another PR if required.

Copy link
Copy Markdown
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayboras I think the Runtime-4.9 failure is relevant here:

# github.com/cilium/cilium/pkg/fqdn/dnsproxy [github.com/cilium/cilium/pkg/fqdn/dnsproxy.test]
pkg/fqdn/dnsproxy/proxy_test.go:202:3: cannot use func literal (type func(time.Time, *endpoint.Endpoint, string, string, *dns.Msg, string, bool, ProxyRequestContext) error) as type NotifyOnDNSMsgFunc in argument to StartDNSProxy

You can reproduce by running privileged unit tests locally.

@sayboras
Copy link
Copy Markdown
Member Author

sayboras commented Jun 2, 2020

I think the Runtime-4.9 failure is relevant here:

# github.com/cilium/cilium/pkg/fqdn/dnsproxy [github.com/cilium/cilium/pkg/fqdn/dnsproxy.test]
pkg/fqdn/dnsproxy/proxy_test.go:202:3: cannot use func literal (type func(time.Time, *endpoint.Endpoint, string, string, *dns.Msg, string, bool, ProxyRequestContext) error) as type NotifyOnDNSMsgFunc in argument to StartDNSProxy

You can reproduce by running privileged unit tests locally.

Thanks for pointing out, it's my silly mistake. I didn't enable custom build tag in my Goland settings. :)

@sayboras sayboras force-pushed the bugfix/span-unsafe branch from 58b8fbd to c998fff Compare June 2, 2020 21:37
@sayboras sayboras requested a review from pchaigno June 2, 2020 21:37
As mentioned in cilium#10991, successDuration was not safely access from
eventqueue

Add RWLock to make this struct thread-safe
Using pointer for spanstat to avoid copylocks issue highlighted by govet

Closes cilium#10991

Signed-off-by: Tam Mach <sayboras@yahoo.com>
@sayboras sayboras force-pushed the bugfix/span-unsafe branch from c998fff to db6872a Compare June 2, 2020 22:00
@joestringer
Copy link
Copy Markdown
Member

test-me-please

@sayboras
Copy link
Copy Markdown
Member Author

sayboras commented Jun 3, 2020

@joestringer @pchaigno seems like all required jobs finished succesfully now, thanks for your inputs, and do let me know if anything else is required.

@christarazi christarazi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 3, 2020
@borkmann borkmann merged commit 6bc5c74 into cilium:master Jun 3, 2020
@sayboras sayboras deleted the bugfix/span-unsafe branch June 3, 2020 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DATA RACE: s.successDuration unsafely accessed

8 participants