Skip to content

tracing: remove panic in Finish#83625

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:tracing-panic
Jul 1, 2022
Merged

tracing: remove panic in Finish#83625
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:tracing-panic

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna commented Jun 29, 2022

In #81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

  1. The assertion was on a code path shared with RecordingVerbose
    spans; and,

  2. A RecordingStructured span can have a RecordingVerbose child. The
    RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Fixes #83502

Release note: None

@stevendanna stevendanna requested a review from a team June 29, 2022 22:46
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@stevendanna stevendanna requested a review from adityamaru June 29, 2022 22:52
Copy link
Copy Markdown
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix 💯

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @stevendanna)


pkg/util/tracing/crdbspan.go line 547 at r1 (raw file):

		// We don't have space for this recording. Let's collect just the structured
		// records.
		for ci := range childRecording {

I think a better fix is to get rid of the assertion in the structured case below, and make that code deal with multiple spans too. That's how it was before #81079, and I'm not sure if there was a specific reason why it changed.

I think it's important for all this code to be fairly permissive in what recording it accepts. For example, as written, I think this code would crash if an RPC is made within a RecordingStructured span, and then the remote child is dynamically changed via \tracez to RecordingVerbose: a verbose, multi-span recording will come back and the client span will choke on it.

Please add the following test with this occassion:

// Test that a RecordingStructured parent does not panic when asked to ingest a
// remote verbose recording. Ingesting a recording of different type is unusual,
// since children are created with the parent's recording mode, but it can
// happen if the child's recording mode was changed dynamically.
func TestRemoteSpanWithDifferentRecordingMode(t *testing.T) {
	tr := NewTracer()
	s1 := tr.StartSpan("p", WithRecording(tracingpb.RecordingStructured))
	s2 := tr.StartSpan("c", WithRemoteParentFromSpanMeta(s1.Meta()), WithRecording(tracingpb.RecordingVerbose))
	s3 := tr.StartSpan("cc", WithParent(s2), WithRecording(tracingpb.RecordingVerbose))
	s3.Finish()
	r := s2.FinishAndGetConfiguredRecording()
	require.NotPanic(s1.ImportRemoteRecording(r))
	s1.Finish()
}

pkg/util/tracing/span_test.go line 408 at r1 (raw file):

}

// TestRecordingFinishWithMaxSpans is a regression test to ensure that a verbose

let's rephrase this test as Aditya was suggesting, and then I don't have to nitpick about this comment.
Call it TestRecordingDowngradesToStructuredIfTooBig and check that you end up with a recording with just one or two spans or whatever, and all the structured events.

@stevendanna stevendanna changed the title tracing: fix panic in Finish for verbose recordings tracing: remove panic in Finish Jun 30, 2022
Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @andreimatei)


pkg/util/tracing/crdbspan.go line 547 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think a better fix is to get rid of the assertion in the structured case below, and make that code deal with multiple spans too. That's how it was before #81079, and I'm not sure if there was a specific reason why it changed.

I think it's important for all this code to be fairly permissive in what recording it accepts. For example, as written, I think this code would crash if an RPC is made within a RecordingStructured span, and then the remote child is dynamically changed via \tracez to RecordingVerbose: a verbose, multi-span recording will come back and the client span will choke on it.

Please add the following test with this occassion:

// Test that a RecordingStructured parent does not panic when asked to ingest a
// remote verbose recording. Ingesting a recording of different type is unusual,
// since children are created with the parent's recording mode, but it can
// happen if the child's recording mode was changed dynamically.
func TestRemoteSpanWithDifferentRecordingMode(t *testing.T) {
	tr := NewTracer()
	s1 := tr.StartSpan("p", WithRecording(tracingpb.RecordingStructured))
	s2 := tr.StartSpan("c", WithRemoteParentFromSpanMeta(s1.Meta()), WithRecording(tracingpb.RecordingVerbose))
	s3 := tr.StartSpan("cc", WithParent(s2), WithRecording(tracingpb.RecordingVerbose))
	s3.Finish()
	r := s2.FinishAndGetConfiguredRecording()
	require.NotPanic(s1.ImportRemoteRecording(r))
	s1.Finish()
}

Done.


pkg/util/tracing/span_test.go line 408 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's rephrase this test as Aditya was suggesting, and then I don't have to nitpick about this comment.
Call it TestRecordingDowngradesToStructuredIfTooBig and check that you end up with a recording with just one or two spans or whatever, and all the structured events.

Done. We don't get all of the structured events because of the size of the ring buffer, but I've checked that we get as many as possible given the size of the buffer.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but let's wait for Aditya to seewhat he says about that assertion. I think at some point he was relying on it, but maybe that changed.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @stevendanna)


pkg/util/tracing/span_test.go line 460 at r2 (raw file):

	s3.Finish()
	r := s2.FinishAndGetConfiguredRecording()
	require.NotPanics(t, func() { s1.ImportRemoteRecording(r) })

please also test that s1's recording only has one span in it

In cockroachdb#81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

1) The assertion was on a code path shared with RecordingVerbose
   spans; and,

2) A RecordingStructured span can have a RecordingVerbose child. The
   RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

    panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Release note: None
@stevendanna
Copy link
Copy Markdown
Collaborator Author

bors r=adityamaru,

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 1, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 1, 2022

Build succeeded:

@craig craig bot merged commit 78caa5c into cockroachdb:master Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: follower-reads/mixed-version/single-region failed

4 participants