Skip to content

fix(profiling): fix data race when accessing span for thread [backport 2.15]#11211

Merged
taegyunkim merged 3 commits into
2.15from
backport-11167-to-2.15
Oct 30, 2024
Merged

fix(profiling): fix data race when accessing span for thread [backport 2.15]#11211
taegyunkim merged 3 commits into
2.15from
backport-11167-to-2.15

Conversation

@nsrip-dd

@nsrip-dd nsrip-dd commented Oct 29, 2024

Copy link
Copy Markdown
Contributor

Backport 64b3374 from #11167 to 2.15.

The ThreadSpanLinks singleton holds the active span (if one exists) for
a given thread ID. The get_active_span_from_thread_id member function
returns a pointer to the active span for a thread. The link_span
member function sets the active span for a thread.
get_active_span_from_thread_id accesses the map of spans under a
mutex, but returns the pointer after releasing the mutex, meaning
link_span can modify the members of the Span while the caller of
get_active_span_from_thread_id is reading them.

Fix this by returning a copy of the Span. Use a std::optional to wrap
the return value of get_active_span_from_thread_id, rather than
returning a pointer. We want to tell whether or not there actually was a
span associated with the thread, but returning a pointer would require
us to heap allocate the copy of the Span.

I added a simplistic regression test which fails reliably without this
fix when built with the thread sanitizer enabled. Output like:

WARNING: ThreadSanitizer: data race (pid=2971510)
  Read of size 8 at 0x7b2000004080 by thread T2:
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__invoke_other, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe46e)
    #4 std::__invoke_result<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>::type std::__invoke<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe2fe)
    #5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe1cf)
    #6 std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::operator()() <null> (thread_span_links+0xe0f6)
    #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> > >::_M_run() <null> (thread_span_links+0xdf40)
    #8 <null> <null> (libstdc++.so.6+0xd6df3)

  Previous write of size 8 at 0x7b2000004080 by thread T1 (mutexes: write M47):
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 get() <null> (thread_span_links+0xb570)
    #4 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) <null> (thread_span_links+0xe525)
    #5 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) <null> (thread_span_links+0xe3b5)
    #6 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe242)
    #7 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() <null> (thread_span_links+0xe158)
[ ... etc ... ]

(cherry picked from commit 64b3374)

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@nsrip-dd nsrip-dd requested review from a team as code owners October 29, 2024 16:46
@github-actions

Copy link
Copy Markdown
Contributor

CODEOWNERS have been resolved as:

ddtrace/internal/datadog/profiling/stack_v2/test/CMakeLists.txt         @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/test/thread_span_links.cpp  @DataDog/profiling-python
releasenotes/notes/profiling-fix-data-race-segfault-b14f59c06c6bf9ed.yaml  @DataDog/apm-python
ddtrace/internal/datadog/profiling/stack_v2/CMakeLists.txt              @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/include/thread_span_links.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/src/stack_renderer.cpp      @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/src/thread_span_links.cpp   @DataDog/profiling-python

The ThreadSpanLinks singleton holds the active span (if one exists) for
a given thread ID. The `get_active_span_from_thread_id` member function
returns a pointer to the active span for a thread. The `link_span`
member function sets the active span for a thread.
`get_active_span_from_thread_id` accesses the map of spans under a
mutex, but returns the pointer after releasing the mutex, meaning
`link_span` can modify the members of the Span while the caller of
`get_active_span_from_thread_id` is reading them.

Fix this by returning a copy of the `Span`. Use a `std::optional` to wrap
the return value of `get_active_span_from_thread_id`, rather than
returning a pointer. We want to tell whether or not there actually was a
span associated with the thread, but returning a pointer would require
us to heap allocate the copy of the Span.

I added a simplistic regression test which fails reliably without this
fix when built with the thread sanitizer enabled. Output like:

```
WARNING: ThreadSanitizer: data race (pid=2971510)
  Read of size 8 at 0x7b2000004080 by thread T2:
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__invoke_other, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe46e)
    #4 std::__invoke_result<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>::type std::__invoke<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe2fe)
    #5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe1cf)
    #6 std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::operator()() <null> (thread_span_links+0xe0f6)
    #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> > >::_M_run() <null> (thread_span_links+0xdf40)
    #8 <null> <null> (libstdc++.so.6+0xd6df3)

  Previous write of size 8 at 0x7b2000004080 by thread T1 (mutexes: write M47):
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 get() <null> (thread_span_links+0xb570)
    #4 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) <null> (thread_span_links+0xe525)
    #5 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) <null> (thread_span_links+0xe3b5)
    #6 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe242)
    #7 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() <null> (thread_span_links+0xe158)
[ ... etc ... ]
```

(cherry picked from commit 64b3374)
@nsrip-dd nsrip-dd force-pushed the backport-11167-to-2.15 branch from 3cc34de to 3e52d8d Compare October 29, 2024 17:21
@pr-commenter

pr-commenter Bot commented Oct 29, 2024

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2024-10-30 16:07:27

Comparing candidate commit 3418b45 in PR branch backport-11167-to-2.15 with baseline commit 8a28a5d in branch 2.15.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 357 metrics, 53 unstable metrics.

@taegyunkim taegyunkim added the Profiling Continous Profling label Oct 30, 2024
@taegyunkim taegyunkim enabled auto-merge (squash) October 30, 2024 16:08
@taegyunkim taegyunkim merged commit 76ae70b into 2.15 Oct 30, 2024
@taegyunkim taegyunkim deleted the backport-11167-to-2.15 branch October 30, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Profiling Continous Profling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants