-
Notifications
You must be signed in to change notification settings - Fork 28
{cmd/trace-agent,writer}: fix race conditions #495
Conversation
| totalTraceCount uint64 | ||
|
|
||
| mu sync.Mutex // guards lastFlush | ||
| lastFlush time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sampler.Sampler thread-safe? No errors thrown nor shown in the wild, just making sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how I think a Start/Stop should look for correctness and no races:
func (w *Something) Start() {
go func() {
w.exitWG.Add(1)
w.Run()
w.exitWG.Done()
}
}
func (w *Something) Stop() {
close(exit)
w.exitWG.Wait()
}
This should solve any races you might have observed no? Feel free to replace the wait group with another channel. Any other option, I fear, introduces the chances for weird corner cases.
| // Add samples a trace and returns true if trace was sampled (should be kept), false otherwise | ||
| func (s *Sampler) Add(t processedTrace) bool { | ||
| s.totalTraceCount++ | ||
| atomic.AddUint64(&s.totalTraceCount, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add can indeed be called from multiple routines simultaneously so 👍
Actually that's the original race right there. Calling |
AlexJF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok LGTM.
|
Right. Although having moved all the waitgroup interactions to just |
|
Thanks for looking. I prefer to have a race-proof API, rather than an API which assumes that the user will not do anything dodgy, which they usually do 😄My sentiment is that a "wait group" should be used with a "group", which is not the case here. I'm open to simpler solutions, this one seems quite safe to me and we've experimented a lot with it within the Go tracer where we had problems of synchronous Start/Stop calls among other abusive behaviour. |
Fixes some (non-fatal) race conditions detected by the
-raceflag.