Skip to content

*: don't mix Tracers in tests #73453

Merged
craig[bot] merged 7 commits intocockroachdb:masterfrom
andreimatei:tracing.tests-dont-mix-tracers
Dec 7, 2021
Merged

*: don't mix Tracers in tests #73453
craig[bot] merged 7 commits intocockroachdb:masterfrom
andreimatei:tracing.tests-dont-mix-tracers

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

Different tests are mixing Tracers within the same trace in various
ways - meaning that a child span is being created with a different
Tracer than the parent. This is not permitted - the Tracer catches it
and panics trying to create the child. The tests were getting away with
it because tracing is usually disabled (i.e. creating no-op spans
doesn't complain about this condition). But if tracing is turned on
(like I'm intending to do generally), these tests freak out.

This patch fixes the tests by ensuring that Tracers are not mixed like
this. One of the common ways in which tests mixing Tracers was by
initiating an operation using a Tracer that the test created (e.g.
through a Stopper), and then calling into a TestServer using that ctx).
In such cases, the solution is to use the Server's Tracer when
initiating the operation.

There is a discussion to be had about whether this practice of mixing
Tracers could be tolerated. The argument for not allowing it goes like
this:

  • child spans have links to the parent and vice-versa, and these links
    are used to move spans in and out of the active spans registry when
    they Finish(). For example, a parent will insert all its still-alive
    children into the registry on Finish(). The registries are per-Tracer,
    and things get confusing if a span is created by Tracer A but ends up in
    Tracer B's registry. So actually mixing Tracers in a trace is probably
    indeed a bad idea.

  • We could avoid mixing Tracers even if the code attempts to use two of
    them. For example, if one does:

    p := trA.StarSpan()
    c := trB.StartSpan(WithParent(p))

    we could imagine redirecting c's creation from trB to trA
    transparently. So far I've resisted doing so because, even though it'd
    technically work, it would become quite surprising/ambiguous which
    registry a span belongs to (e.g. one would reasonably expect c to be
    found in trB's registry, but it wouldn't).
    So far, I've attempted to avoid these surprises and disallow this
    pattern.

In production there is one Tracer per Server, so these issues are moot
(well, the issues come up in a DemoCluster, where there's multiple
Servers, but I think I've fixed those). For tests, I recognize that they
generally don't care about tracing at all because we haven't really made
background tracing useful for tests (but maybe one day...). So I'll keep
my ears open for pain and, if after this round of cleanup, the issue
keeps popping up, I'll consider adding support for mixing.

Release note: None

@andreimatei andreimatei requested a review from a team as a code owner December 3, 2021 21:17
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@andreimatei andreimatei force-pushed the tracing.tests-dont-mix-tracers branch from 1bed4fc to 0c70d2f Compare December 3, 2021 21:39
@andreimatei andreimatei force-pushed the tracing.tests-dont-mix-tracers branch from 0c70d2f to a19fdfe Compare December 4, 2021 17:11
@andreimatei andreimatei requested a review from a team as a code owner December 4, 2021 17:11
Copy link
Copy Markdown
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

Different tests are mixing Tracers within the same trace in various
ways - meaning that a child span is being created with a different
Tracer than the parent. This is not permitted - the Tracer catches it
and panics trying to create the child. The tests were getting away with
it because tracing is usually disabled (i.e. creating no-op spans
doesn't complain about this condition). But if tracing is turned on
(like I'm intending to do generally), these tests freak out.

This patch fixes the tests by ensuring that Tracers are not mixed like
this. One of the common ways in which tests mixing Tracers was by
initiating an operation using a Tracer that the test created (e.g.
through a Stopper), and then calling into a TestServer using that ctx).
In such cases, the solution is to use the Server's Tracer when
initiating the operation.

There is a discussion to be had about whether this practice of mixing
Tracers could be tolerated. The argument for not allowing it goes like
this:
- child spans have links to the parent and vice-versa, and these links
  are used to move spans in and out of the active spans registry when
  they Finish(). For example, a parent will insert all its still-alive
  children into the registry on Finish(). The registries are per-Tracer,
  and things get confusing if a span is created by Tracer A but ends up in
  Tracer B's registry. So actually mixing Tracers in a trace is probably
  indeed a bad idea.
- We could avoid mixing Tracers even if the code attempts to use two of
  them. For example, if one does:

  p := trA.StarSpan()
  c := trB.StartSpan(WithParent(p))

  we could imagine redirecting c's creation from trB to trA
  transparently. So far I've resisted doing so because, even though it'd
  technically work, it would become quite surprising/ambiguous which
  registry a span belongs to (e.g. one would reasonably expect c to be
  found in trB's registry, but it wouldn't).
  So far, I've attempted to avoid these surprises and disallow this
  pattern.

In production there is one Tracer per Server, so these issues are moot
(well, the issues come up in a DemoCluster, where there's multiple
Servers, but I think I've fixed those). For tests, I recognize that they
generally don't care about tracing at all because we haven't really made
background tracing useful for tests (but maybe one day...). So I'll keep
my ears open for pain and, if after this round of cleanup, the issue
keeps popping up, I'll consider adding support for mixing.

Release note: None
Different tests are mixing Tracers within the same trace in various
ways - meaning that a child span is being created with a different
Tracer than the parent. This is not permitted - the Tracer catches it
and panics trying to create the child. The tests were getting away with
it because tracing is usually disabled (i.e. creating no-op spans
doesn't complain about this condition). But if tracing is turned on
(like I'm intending to do generally), these tests freak out.

See the first commit in the PR for more context.

Release note: None
A couple of tests were using a cluster's stopper to operate on a
particular node. This is pretty funky; the node has its own stopper that
should be used to guard operations on the node. At the very least, this
mix of stoppers was causing two different tracers to be involved in the
respective operation, which might not be allowed in the future.

Release note: None
Different tests are mixing Tracers within the same trace in various
ways - meaning that a child span is being created with a different
Tracer than the parent. This is not permitted - the Tracer catches it
and panics trying to create the child. The tests were getting away with
it because tracing is usually disabled (i.e. creating no-op spans
doesn't complain about this condition). But if tracing is turned on
(like I'm intending to do generally), these tests freak out.

See the first commit in the PR for more context.

Release note: None
Different tests are mixing Tracers within the same trace in various
ways - meaning that a child span is being created with a different
Tracer than the parent. This is not permitted - the Tracer catches it
and panics trying to create the child. The tests were getting away with
it because tracing is usually disabled (i.e. creating no-op spans
doesn't complain about this condition). But if tracing is turned on
(like I'm intending to do generally), these tests freak out.

See the first commit in the PR for more context.

Release note: None
This test was creating a tracing span for its operations for no good
reason. I think years ago that span was used for debugging by setting up
a tracing integration for this test. I bet it hasn't been used since. In
fact, if tracing is enabled for this test, the test currently crashes
because that span causes two different Tracers to be used within the
same operation, which is currently not allowed. This patch gets rid of
this problematic span.

Release note: None
Before this patch, attempting to create a child with a different Tracer
than the parent would panic is the parent was a "real span", but it
wouldn't if the parent was a no-op. This happenstence was allowing tests
to get away with mixing Tracers, as tests always run with tracing turned
off. Now that all the tests that were mixing Tracers were hopefully
fixed, this patch makes it illegal (panic) to mix Tracers even when
tracingt is not enabled, for consistency of behavior.

Release note: None
@andreimatei andreimatei force-pushed the tracing.tests-dont-mix-tracers branch from a19fdfe to bdc36d4 Compare December 7, 2021 21:49
Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 7, 2021

Build succeeded:

@craig craig bot merged commit ca06f4f into cockroachdb:master Dec 7, 2021
@andreimatei andreimatei deleted the tracing.tests-dont-mix-tracers branch January 22, 2022 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants