Skip to content

chore(profiling): address issue found by fuzzer#16631

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 3 commits into
mainfrom
kowalski/chore-profiling-address-issue-found-by-fuzzer
Mar 10, 2026
Merged

chore(profiling): address issue found by fuzzer#16631
gh-worker-dd-mergequeue-cf854d[bot] merged 3 commits into
mainfrom
kowalski/chore-profiling-address-issue-found-by-fuzzer

Conversation

@KowalskiThomas

@KowalskiThomas KowalskiThomas commented Feb 24, 2026

Copy link
Copy Markdown
Contributor

Description

https://datadoghq.atlassian.net/browse/PROF-13838

This was reported by the fuzzer and although I don't think we've ever seen it happen in real life (according to crash stacks), it's still valid and worth fixing.

@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented Feb 24, 2026

Copy link
Copy Markdown

Codeowners resolved as

ddtrace/internal/datadog/profiling/stack/src/echion/stack_chunk.cc      @DataDog/profiling-python

@KowalskiThomas KowalskiThomas added changelog/no-changelog A changelog entry is not required for this PR. Profiling Continous Profling labels Feb 24, 2026
@KowalskiThomas KowalskiThomas force-pushed the kowalski/chore-profiling-address-issue-found-by-fuzzer branch from d6774d6 to 29d5ba4 Compare February 24, 2026 07:35
@datadog-datadog-prod-us1-2

This comment has been minimized.

@KowalskiThomas KowalskiThomas added the identified-by:fuzzing Bug / Regression identified by fuzzing label Feb 24, 2026

@taegyunkim taegyunkim left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering how this can happen, maybe there's a chance where we copy over a StackChunk that is of size x, but then it has been incremented as the program executes, and we get an address that is above x?

@KowalskiThomas KowalskiThomas marked this pull request as ready for review February 24, 2026 10:18
@KowalskiThomas KowalskiThomas requested a review from a team as a code owner February 24, 2026 10:18
@taegyunkim

Copy link
Copy Markdown
Contributor

https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-py/-/jobs/1450393487

the crash from test would need some attention

There are couple more suites that crashed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29d5ba4db2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddtrace/internal/datadog/profiling/stack/echion/echion/stack_chunk.h Outdated
Comment thread ddtrace/internal/datadog/profiling/stack/src/echion/stack_chunk.cc Outdated
Comment thread ddtrace/internal/datadog/profiling/stack/src/echion/stack_chunk.cc Outdated
@KowalskiThomas KowalskiThomas force-pushed the kowalski/chore-profiling-address-issue-found-by-fuzzer branch from 6a66816 to 18d6d4d Compare March 10, 2026 09:48
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 88f6b34 into main Mar 10, 2026
384 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the kowalski/chore-profiling-address-issue-found-by-fuzzer branch March 10, 2026 11:17
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Mar 20, 2026
<!-- dd-meta {"pullId":"4729b268-475f-44c8-bd80-994fac3bc0b8","source":"chat","resourceId":"ad20b0b3-bac2-4224-8b7b-5f42225c5803","workflowId":"94e37a14-9b8d-462d-aafd-dc7f0d9ddf36","codeChangeId":"94e37a14-9b8d-462d-aafd-dc7f0d9ddf36","sourceType":"chat"} -->
## Description

This fixes a crash happening in `Frame::read` caused by stale `previous` `StackChunk` entries persisting across thread iterations during stack sampling.

### Root Cause

When the Sampling Thread samples more than one Thread, it uses the same global `StackChunk` for each Thread's stack chain. `StackChunk::update_with_depth` recursively copies the linked list of `_PyStackChunk`'s.  
However, when a stack chunk has no previous chunk, we would not clear the old `previous` pointer. This left stale `StackChunk` entries from previously-sampled threads in the chain.

When a subsequent Thread's frame address happened to fall within the remote address range of a stale chunk's `origin`, `StackChunk::resolve` would return a pointer into the stale local buffer. The stale data contained garbage field values, which would result in invalid accesses. 

This is the same crash signature as #16519 (which fixed a race condition on `copied_size`) and #16631 (which added full-frame bounds checking). The stale `previous` chain was an additional vector for the same class of bug.

This is the crash we would see:

```
#0 0x00007fa8a3507fc6 Frame::read
#1 0x00007fa8a3508114 unwind_frame
#2 0x00007fa8a3509c68 ThreadInfo::unwind
#3 0x00007fa8a3509da2 ThreadInfo::sample
#4 0x00007fa8a350a02e std::_Function_handler<void (_ts*, ThreadInfo&), Datadog::Sampler::sampling_thread(unsigned long)::{lambda(InterpreterInfo&)#1}::operator()(InterpreterInfo&) const::{lambda(_ts*, ThreadInfo&)#1}>::_M_invoke
#5 0x00007fa8a350a2c0 for_each_thread
#6 0x00007fa8a350a382 std::_Function_handler<void (InterpreterInfo&), Datadog::Sampler::sampling_thread(unsigned long)::{lambda(InterpreterInfo&)#1}>::_M_invoke
#7 0x00007fa8a35072f9 for_each_interp
#8 0x00007fa8a350a6e7 Datadog::Sampler::sampling_thread
#9 0x00007fa8a350a853 call_sampling_thread
```


Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog A changelog entry is not required for this PR. identified-by:fuzzing Bug / Regression identified by fuzzing Profiling Continous Profling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants