tracing: remove panic in Finish#83625
Conversation
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @stevendanna)
pkg/util/tracing/crdbspan.go line 547 at r1 (raw file):
// We don't have space for this recording. Let's collect just the structured // records. for ci := range childRecording {
I think a better fix is to get rid of the assertion in the structured case below, and make that code deal with multiple spans too. That's how it was before #81079, and I'm not sure if there was a specific reason why it changed.
I think it's important for all this code to be fairly permissive in what recording it accepts. For example, as written, I think this code would crash if an RPC is made within a RecordingStructured span, and then the remote child is dynamically changed via \tracez to RecordingVerbose: a verbose, multi-span recording will come back and the client span will choke on it.
Please add the following test with this occassion:
// Test that a RecordingStructured parent does not panic when asked to ingest a
// remote verbose recording. Ingesting a recording of different type is unusual,
// since children are created with the parent's recording mode, but it can
// happen if the child's recording mode was changed dynamically.
func TestRemoteSpanWithDifferentRecordingMode(t *testing.T) {
tr := NewTracer()
s1 := tr.StartSpan("p", WithRecording(tracingpb.RecordingStructured))
s2 := tr.StartSpan("c", WithRemoteParentFromSpanMeta(s1.Meta()), WithRecording(tracingpb.RecordingVerbose))
s3 := tr.StartSpan("cc", WithParent(s2), WithRecording(tracingpb.RecordingVerbose))
s3.Finish()
r := s2.FinishAndGetConfiguredRecording()
require.NotPanic(s1.ImportRemoteRecording(r))
s1.Finish()
}
pkg/util/tracing/span_test.go line 408 at r1 (raw file):
} // TestRecordingFinishWithMaxSpans is a regression test to ensure that a verbose
let's rephrase this test as Aditya was suggesting, and then I don't have to nitpick about this comment.
Call it TestRecordingDowngradesToStructuredIfTooBig and check that you end up with a recording with just one or two spans or whatever, and all the structured events.
d768e25 to
b94c14f
Compare
stevendanna
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @andreimatei)
pkg/util/tracing/crdbspan.go line 547 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think a better fix is to get rid of the assertion in the structured case below, and make that code deal with multiple spans too. That's how it was before #81079, and I'm not sure if there was a specific reason why it changed.
I think it's important for all this code to be fairly permissive in what recording it accepts. For example, as written, I think this code would crash if an RPC is made within a
RecordingStructuredspan, and then the remote child is dynamically changed via\traceztoRecordingVerbose: a verbose, multi-span recording will come back and the client span will choke on it.Please add the following test with this occassion:
// Test that a RecordingStructured parent does not panic when asked to ingest a // remote verbose recording. Ingesting a recording of different type is unusual, // since children are created with the parent's recording mode, but it can // happen if the child's recording mode was changed dynamically. func TestRemoteSpanWithDifferentRecordingMode(t *testing.T) { tr := NewTracer() s1 := tr.StartSpan("p", WithRecording(tracingpb.RecordingStructured)) s2 := tr.StartSpan("c", WithRemoteParentFromSpanMeta(s1.Meta()), WithRecording(tracingpb.RecordingVerbose)) s3 := tr.StartSpan("cc", WithParent(s2), WithRecording(tracingpb.RecordingVerbose)) s3.Finish() r := s2.FinishAndGetConfiguredRecording() require.NotPanic(s1.ImportRemoteRecording(r)) s1.Finish() }
Done.
pkg/util/tracing/span_test.go line 408 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's rephrase this test as Aditya was suggesting, and then I don't have to nitpick about this comment.
Call itTestRecordingDowngradesToStructuredIfTooBigand check that you end up with a recording with just one or two spans or whatever, and all the structured events.
Done. We don't get all of the structured events because of the size of the ring buffer, but I've checked that we get as many as possible given the size of the buffer.
andreimatei
left a comment
There was a problem hiding this comment.
but let's wait for Aditya to seewhat he says about that assertion. I think at some point he was relying on it, but maybe that changed.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @stevendanna)
pkg/util/tracing/span_test.go line 460 at r2 (raw file):
s3.Finish() r := s2.FinishAndGetConfiguredRecording() require.NotPanics(t, func() { s1.ImportRemoteRecording(r) })
please also test that s1's recording only has one span in it
In cockroachdb#81079, we added an assertion that failed if a child recording of a RecordingStructured span had more than one span recording. However, this is problematic for couple of reasons: 1) The assertion was on a code path shared with RecordingVerbose spans; and, 2) A RecordingStructured span can have a RecordingVerbose child. The RecordingVerbose child is likely to have more than one span recording. As a result, we have seen roachtests failing with panic: RecordingStructured has 12 recordings; expected 1 Here, we remove the assertion. Release note: None
b94c14f to
c66abb3
Compare
|
bors r=adityamaru, |
|
Build failed (retrying...): |
|
Build succeeded: |
In #81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:
The assertion was on a code path shared with RecordingVerbose
spans; and,
A RecordingStructured span can have a RecordingVerbose child. The
RecordingVerbose child is likely to have more than one span recording.
As a result, we have seen roachtests failing with
Here, we remove the assertion.
Fixes #83502
Release note: None