Skip to content

fix(profiling): clear stale StackChunk::previous#17043

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit into
mainfrom
dd/fix/profiling-stale-stack-chunk-previous
Mar 20, 2026
Merged

fix(profiling): clear stale StackChunk::previous#17043
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit into
mainfrom
dd/fix/profiling-stale-stack-chunk-previous

Conversation

@KowalskiThomas

@KowalskiThomas KowalskiThomas commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Description

This fixes a crash happening in Frame::read caused by stale previous StackChunk entries persisting across thread iterations during stack sampling.

Root Cause

When the Sampling Thread samples more than one Thread, it uses the same global StackChunk for each Thread's stack chain. StackChunk::update_with_depth recursively copies the linked list of _PyStackChunk's.
However, when a stack chunk has no previous chunk, we would not clear the old previous pointer. This left stale StackChunk entries from previously-sampled threads in the chain.

When a subsequent Thread's frame address happened to fall within the remote address range of a stale chunk's origin, StackChunk::resolve would return a pointer into the stale local buffer. The stale data contained garbage field values, which would result in invalid accesses.

This is the same crash signature as #16519 (which fixed a race condition on copied_size) and #16631 (which added full-frame bounds checking). The stale previous chain was an additional vector for the same class of bug.

This is the crash we would see:

#0 0x00007fa8a3507fc6 Frame::read
#1 0x00007fa8a3508114 unwind_frame
#2 0x00007fa8a3509c68 ThreadInfo::unwind
#3 0x00007fa8a3509da2 ThreadInfo::sample
#4 0x00007fa8a350a02e std::_Function_handler<void (_ts*, ThreadInfo&), Datadog::Sampler::sampling_thread(unsigned long)::{lambda(InterpreterInfo&)#1}::operator()(InterpreterInfo&) const::{lambda(_ts*, ThreadInfo&)#1}>::_M_invoke
#5 0x00007fa8a350a2c0 for_each_thread
#6 0x00007fa8a350a382 std::_Function_handler<void (InterpreterInfo&), Datadog::Sampler::sampling_thread(unsigned long)::{lambda(InterpreterInfo&)#1}>::_M_invoke
#7 0x00007fa8a35072f9 for_each_interp
#8 0x00007fa8a350a6e7 Datadog::Sampler::sampling_thread
#9 0x00007fa8a350a853 call_sampling_thread

@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented Mar 20, 2026

Copy link
Copy Markdown

View session in Datadog

Bits Dev status: ✅ Done

Comment @DataDog to request changes

@datadog-prod-us1-4

Copy link
Copy Markdown
Contributor

I can only run on private repositories.

@cit-pr-commenter-54b7da

Copy link
Copy Markdown

Codeowners resolved as

ddtrace/internal/datadog/profiling/stack/src/echion/stack_chunk.cc      @DataDog/profiling-python
releasenotes/notes/profiling-fix-stale-stack-chunk-previous-7b1082f9f0ee4a47.yaml  @DataDog/apm-python

@KowalskiThomas KowalskiThomas changed the title fix(profiling): clear stale StackChunk previous fix(profiling): clear stale StackChunk previous Mar 20, 2026
@KowalskiThomas KowalskiThomas added Profiling Continous Profling identified-by:crashtracking Identified by Crash Tracking labels Mar 20, 2026
@KowalskiThomas KowalskiThomas changed the title fix(profiling): clear stale StackChunk previous fix(profiling): clear stale StackChunk::previous Mar 20, 2026
Co-authored-by: KowalskiThomas <14239160+KowalskiThomas@users.noreply.github.com>
@KowalskiThomas KowalskiThomas force-pushed the dd/fix/profiling-stale-stack-chunk-previous branch from 3054bc9 to 9e3afd8 Compare March 20, 2026 13:30
@KowalskiThomas KowalskiThomas marked this pull request as ready for review March 20, 2026 14:20
@KowalskiThomas KowalskiThomas requested review from a team as code owners March 20, 2026 14:20
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit ea99aa6 into main Mar 20, 2026
425 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the dd/fix/profiling-stale-stack-chunk-previous branch March 20, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bits AI identified-by:crashtracking Identified by Crash Tracking Profiling Continous Profling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants