Skip to content

[OTel C++] Fix race when adding and removing callbacks#37485

Closed
yashykt wants to merge 4 commits intogrpc:masterfrom
yashykt:FixRaceOTelGauge
Closed

[OTel C++] Fix race when adding and removing callbacks#37485
yashykt wants to merge 4 commits intogrpc:masterfrom
yashykt:FixRaceOTelGauge

Conversation

@yashykt
Copy link
Copy Markdown
Member

@yashykt yashykt commented Aug 15, 2024

Split off from #37425

We are adding and removing callbacks on the OpenTelemetry Async Instruments without synchronization. This opens us to races where we have an AddCallback and RemoveCallback operation happening at the same time. The correct result after these operations is to still have a callback registered with OpenTelemetry at the end, but the two operations could race and we could just decide to remove the OpenTelemetry callback.

The fix delays removing OpenTelemetry callbacks to plugin destruction time.

Copy link
Copy Markdown
Member

@yijiem yijiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yash!

Copy link
Copy Markdown
Member

@yijiem yijiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yash!

Comment on lines +2074 to +2083
MetricsCollectorThread new_new_collector{
this,
grpc_core::Duration::Milliseconds(100) * grpc_test_slowdown_factor(),
kIterations,
[&](const absl::flat_hash_map<
std::string,
std::vector<opentelemetry::sdk::metrics::PointDataAttributes>>&
data) { return false; }};
// We shouldn't get any new callbacks
EXPECT_THAT(new_new_collector.Stop(), ::testing::IsEmpty());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you may do:

{
  MetricsCollectorThread collector{
        this,
        grpc_core::Duration::Milliseconds(100) * grpc_test_slowdown_factor(),
        kIterations,
        [&](const absl::flat_hash_map<
            std::string,
            std::vector<opentelemetry::sdk::metrics::PointDataAttributes>>&
                data) {
          return false;
        }};
  EXPECT_THAT(collector.Stop(), ::testing::IsEmpty());
}

same below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not helping though. All it is doing is allowing us to use the same variable again. For tests, we should be aiming for readability rather than efficiency.

yashykt added a commit to yashykt/grpc that referenced this pull request Aug 16, 2024
Split off from grpc#37425

We are adding and removing callbacks on the OpenTelemetry Async Instruments without synchronization. This opens us to races where we have an AddCallback and RemoveCallback operation happening at the same time. The correct result after these operations is to still have a callback registered with OpenTelemetry at the end, but the two operations could race and we could just decide to remove the OpenTelemetry callback.

The fix delays removing OpenTelemetry callbacks to plugin destruction time.

Closes grpc#37485

COPYBARA_INTEGRATE_REVIEW=grpc#37485 from yashykt:FixRaceOTelGauge 016b0a4
PiperOrigin-RevId: 663492598
yashykt added a commit to yashykt/grpc that referenced this pull request Aug 16, 2024
Split off from grpc#37425

We are adding and removing callbacks on the OpenTelemetry Async Instruments without synchronization. This opens us to races where we have an AddCallback and RemoveCallback operation happening at the same time. The correct result after these operations is to still have a callback registered with OpenTelemetry at the end, but the two operations could race and we could just decide to remove the OpenTelemetry callback.

The fix delays removing OpenTelemetry callbacks to plugin destruction time.

Closes grpc#37485

COPYBARA_INTEGRATE_REVIEW=grpc#37485 from yashykt:FixRaceOTelGauge 016b0a4
PiperOrigin-RevId: 663492598
yashykt added a commit that referenced this pull request Aug 16, 2024
Backport #37485 to v1.65.x
Split off from #37425

We are adding and removing callbacks on the OpenTelemetry Async
Instruments without synchronization. This opens us to races where we
have an AddCallback and RemoveCallback operation happening at the same
time. The correct result after these operations is to still have a
callback registered with OpenTelemetry at the end, but the two
operations could race and we could just decide to remove the
OpenTelemetry callback.

The fix delays removing OpenTelemetry callbacks to plugin destruction
time.
yashykt added a commit that referenced this pull request Aug 16, 2024
Backport #37485 to v1.66
Split off from #37425

We are adding and removing callbacks on the OpenTelemetry Async
Instruments without synchronization. This opens us to races where we
have an AddCallback and RemoveCallback operation happening at the same
time. The correct result after these operations is to still have a
callback registered with OpenTelemetry at the end, but the two
operations could race and we could just decide to remove the
OpenTelemetry callback.

The fix delays removing OpenTelemetry callbacks to plugin destruction
time.
paulosjca pushed a commit to paulosjca/grpc that referenced this pull request Nov 25, 2024
Split off from grpc#37425

We are adding and removing callbacks on the OpenTelemetry Async Instruments without synchronization. This opens us to races where we have an AddCallback and RemoveCallback operation happening at the same time. The correct result after these operations is to still have a callback registered with OpenTelemetry at the end, but the two operations could race and we could just decide to remove the OpenTelemetry callback.

The fix delays removing OpenTelemetry callbacks to plugin destruction time.

Closes grpc#37485

COPYBARA_INTEGRATE_REVIEW=grpc#37485 from yashykt:FixRaceOTelGauge 016b0a4
PiperOrigin-RevId: 663492598
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants