kvserver: plug a tracing span leak#61279
Conversation
4668c3c to
0c032ac
Compare
knz
left a comment
There was a problem hiding this comment.
I think this is right but I'll let Tobias confirm.
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
pkg/testutils/testcluster/testcluster.go, line 145 at r1 (raw file):
} var buf strings.Builder buf.WriteString(fmt.Sprintf("unexpectedly found %d active spans:\n", len(sps)))
fmt.Fprintf(&buf, "unexpectedly ...")
tbg
left a comment
There was a problem hiding this comment.
Nice! I had basically written this code but failed to realize that the proposal buffer might be where the proposal is hiding.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)
0c032ac to
94304d6
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Thanks!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/testutils/testcluster/testcluster.go, line 145 at r1 (raw file):
Previously, knz (kena) wrote…
fmt.Fprintf(&buf, "unexpectedly ...")
Done.
|
Build failed: |
Fixes cockroachdb#60677, removing a stop-gap introduced in cockroachdb#59992. We were previously leaking "async consensus" spans, which was possible when a given proposal was never flushed out of the replica's proposal buffer. On server shut down, this buffered proposal was never finished, and thus the embedded span never closed. We now add a closer to clean up after ourselves. Release justification: bug fixes and low-risk updates to new functionality. Release note: None
94304d6 to
3aea858
Compare
|
I think CI tripped over #61120? Dunno. bors r+ |
|
Build failed (retrying...): |
|
Build succeeded: |
Fixes cockroachdb#63761. As of cockroachdb#61279, it was possible for us to deadlock due to inconsistent lock orderings between Stopper.mu and Replica.mu. We were previously holding onto Stopper.mu while executing all closers, including those that may acquire other locks. Because closers can be defined anywhere (and may consequently grab any in-scope lock), we should order Stopper.mu to come after all other locks in the system. The closer added in cockroachdb#61279 iterated over all non-destroyed replicas, locking Replica.mu to check for the replica's destroy status (in Store.VisitReplicas). This deadlocked with the lease acquisition code path that first grabs an exclusive lock over Replica.mu (see InitOrJoinRequest), and uses the stopper to kick off an async task acquiring the lease. The stopper internally locks Stopper.mu to check whether or not it was already stopped. Release note: None
Fixes cockroachdb#63761. As of cockroachdb#61279, it was possible for us to deadlock due to inconsistent lock orderings between Stopper.mu and Replica.mu. We were previously holding onto Stopper.mu while executing all closers, including those that may acquire other locks. Because closers can be defined anywhere (and may consequently grab any in-scope lock), we should order Stopper.mu to come after all other locks in the system. The closer added in cockroachdb#61279 iterated over all non-destroyed replicas, locking Replica.mu to check for the replica's destroy status (in Store.VisitReplicas). This deadlocked with the lease acquisition code path that first grabs an exclusive lock over Replica.mu (see InitOrJoinRequest), and uses the stopper to kick off an async task acquiring the lease. The stopper internally locks Stopper.mu to check whether or not it was already stopped. Release note: None
Fixes cockroachdb#63761. As of cockroachdb#61279, it was possible for us to deadlock due to inconsistent lock orderings between Stopper.mu and Replica.mu. We were previously holding onto Stopper.mu while executing all closers, including those that may acquire other locks. Because closers can be defined anywhere (and may consequently grab any in-scope lock), we should order Stopper.mu to come after all other locks in the system. The closer added in cockroachdb#61279 iterated over all non-destroyed replicas, locking Replica.mu to check for the replica's destroy status (in Store.VisitReplicas). This deadlocked with the lease acquisition code path that first grabs an exclusive lock over Replica.mu (see InitOrJoinRequest), and uses the stopper to kick off an async task acquiring the lease. The stopper internally locks Stopper.mu to check whether or not it was already stopped. Release note: None
63867: stop: break deadlock between Stopper.mu and Replica.mu r=irfansharif a=irfansharif Fixes #63761. As of #61279, it was possible for us to deadlock due to inconsistent lock orderings between Stopper.mu and Replica.mu. We were previously holding onto Stopper.mu while executing all closers, including those that may acquire other locks. Because closers can be defined anywhere (and may consequently grab any in-scope lock), we should order Stopper.mu to come after all other locks in the system. The closer added in #61279 iterated over all non-destroyed replicas, locking Replica.mu to check for the replica's destroy status (in Store.VisitReplicas). This deadlocked with the lease acquisition code path that first grabs an exclusive lock over Replica.mu (see InitOrJoinRequest), and uses the stopper to kick off an async task acquiring the lease. The stopper internally locks Stopper.mu to check whether or not it was already stopped. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Fixes #60677, removing a stop-gap introduced in #59992.
We were previously leaking "async consensus" spans, which was possible
when a given proposal was never flushed out of the replica's proposal
buffer. On server shut down, this buffered proposal was never finished,
and thus the embedded span never closed. We now add a closer to clean up
after ourselves.
Release justification: bug fixes and low-risk updates to new
functionality.
Release note: None
+cc @angelapwen / @knz / @erikgrinaker for pod-visibility.