feat(data-pipeline)!: add fork safety hooks and cancellation token for trace exporter FFI#2051
Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 7d62c8a | Docs | Datadog PR Page | Give us feedback! |
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2051 +/- ##
==========================================
- Coverage 73.67% 73.57% -0.10%
==========================================
Files 472 472
Lines 78697 78737 +40
==========================================
- Hits 57978 57933 -45
- Misses 20719 20804 +85
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 22e92370e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| tokio = { version = "1.23", features = ["macros"] } | ||
| tokio-util = "0.7.11" |
There was a problem hiding this comment.
Regenerate Cargo.lock for the new dependencies
Adding tokio/tokio-util to this crate without committing the corresponding root Cargo.lock update breaks locked builds: I checked cargo check -p libdd-data-pipeline-ffi --locked, and Cargo exits with the lock file /workspace/libdatadog/Cargo.lock needs to be updated. Any CI or consumer build using the repository lockfile will fail before compiling this crate until the lock entry for libdd-data-pipeline-ffi includes these new dependencies.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
edit: I think I was wrong, looks like a legit failure: https://github.com/DataDog/libdatadog/actions/runs/26578731968/job/78305509370?pr=2051
original comment:
This feels like a case of "AI doesn't know how "libdd-data-pipeline-ffi" is used.
This might information be worth writing down (maybe in https://github.com/DataDog/libdatadog/blob/22e92370e5d66c0efd6ac5fe949bd8b6514fd902/libdd-data-pipeline-ffi/README.md, or other place if this is common across multiple libdatadog/* crates).
| "TracerSpan" = "ddog_TracerSpan" | ||
| "TracerSpanFields" = "ddog_TracerSpanFields" | ||
| "TracerTraceChunks" = "ddog_TracerTraceChunks" | ||
| "Handle_TokioCancellationToken" = "ddog_TraceExporterCancelToken" |
There was a problem hiding this comment.
Declare the opaque cancellation token type in cbindgen
With only the handle rename here, cbindgen emits typedef ddog_CancellationToken ddog_TokioCancellationToken; in data-pipeline.h, but this header only includes common.h, which does not define ddog_CancellationToken; I regenerated the header and cc -fsyntax-only reports unknown type name 'ddog_CancellationToken'. C/C++ consumers that include the generated data-pipeline header will not compile when this new token type is present, so this needs the same opaque token rename/forward declaration pattern used by the profiling FFI header.
Useful? React with 👍 / 👎.
| /// | ||
| /// All clones of the same token observe the cancellation. If the token is | ||
| /// currently passed to [`ddog_trace_exporter_send_trace_chunks`], the | ||
| /// in-flight HTTP request will be aborted cooperatively. |
There was a problem hiding this comment.
What does cooperatively mean here?
| /// Cancel a cancellation token. | ||
| /// | ||
| /// All clones of the same token observe the cancellation. If the token is | ||
| /// currently passed to [`ddog_trace_exporter_send_trace_chunks`], the |
There was a problem hiding this comment.
| /// currently passed to [`ddog_trace_exporter_send_trace_chunks`], the | |
| /// currently help by [`ddog_trace_exporter_send_trace_chunks`], the |
Wouldn't "currently held" be more accurate than "currently passed" here, since "ddog_trace_exporter_send_trace_chunks" might have finished its work already, thus not holding the token anymore?
|
|
||
| /// Cancel a cancellation token. | ||
| /// | ||
| /// All clones of the same token observe the cancellation. If the token is |
There was a problem hiding this comment.
What if the token was not passed to ddog_trace_exporter_send_trace_chunks, but will soon, is a cancellation: scheduled to happen; ddog_trace_exporter_send_trace_chunks won't start because it's already cancelled; nothing?
It's good to document what happens when the token is not held yet.
And also, if the token is no longer held (aka ddog_trace_exporter_send_trace_chunks is done), which I assume means nothing happens on ddog_trace_exporter_cancel_token_cancel.
| /// # Safety | ||
| /// | ||
| /// * `token` must point to a valid [`Handle`] returned by | ||
| /// [`ddog_trace_exporter_cancel_token_new`]. |
There was a problem hiding this comment.
| /// # Safety | |
| /// | |
| /// * `token` must point to a valid [`Handle`] returned by | |
| /// [`ddog_trace_exporter_cancel_token_new`]. | |
| /// * `token` is the return of [`ddog_trace_exporter_cancel_token_new`]. |
I don't think we need to say that we should give this function a "valid" input.
There was a problem hiding this comment.
The same applies to ddog_trace_exporter_cancel_token_drop.
There was a problem hiding this comment.
There's more of these Safety comment blocks in down this file. All the ones in a similar "type-checked argument must be type-valid" is likely not needed.
|
|
||
| /// Free a cancellation token handle. | ||
| /// | ||
| /// After this call the pointer is invalid and must not be reused. |
There was a problem hiding this comment.
Not sure if this is useful additional information "After this call the pointer is invalid and must not be reused.".
Freed resources cannot be used and are likely invalid.
If we want to provide additional context here, the only thing I can think of is: what happens if you free it while ddog_trace_exporter_send_trace_chunks still owns it. But I feel like that's kind of a natural contract in Rust/native: freeing things in use == bad.
Not sure if we need an extra comment here.
| /// * `chunks` is consumed and must not be used after this call. | ||
| /// * If `response_out` is non-null it must point to valid writable memory for a | ||
| /// `Box<ExporterResponse>`. | ||
| /// * If `cancel` is non-null it must point to a valid cancellation token handle. |
There was a problem hiding this comment.
Similar comment to those in ddog_trace_exporter_cancel_token_cancel: is it even possible to pass an invalid cancellation token since the argument is typed *mut Handle<TokioCancellationToken>?
marcotc
left a comment
There was a problem hiding this comment.
Despite my other comments, the implementation looks good 👍
Add `ddog_trace_exporter_before_fork`, `_after_fork_in_parent`, and `_after_fork_in_child` that delegate to the underlying SharedRuntime. These allow C callers to coordinate the tokio runtime lifecycle around process forks.
Introduce `ddog_trace_exporter_cancel_token_new`, `_cancel`, and `_drop` for managing cancellation tokens. Thread the token into `ddog_trace_exporter_send_trace_chunks` via `tokio::select!` so callers can cooperatively abort in-flight HTTP requests.
22e9237 to
c5c339a
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
Add an optional cancellation token to the synchronous send_trace_chunks path. When the token is cancelled while a request is in flight, the send is aborted and an interrupted IO error is returned. This moves the cancellation logic into the exporter itself instead of relying on the experimental async send path, keeping the FFI layer a thin wrapper.
Replace the `Handle<CancellationToken>` cancel parameter and token functions with a plain `Option<&CancellationToken>`/`Box` ownership model. A null pointer now maps to `None` via the nullable-pointer optimization, with no conversion. The cbindgen config is updated to emit the token as an opaque `ddog_TraceExporterCancelToken`, which also removes the previously undefined `ddog_CancellationToken` reference from the generated header.
Remove the public `shared_runtime()` accessor and the trace-exporter fork hooks. Fork safety is now driven through the `SharedRuntime` the caller constructs (via libdd-shared-runtime-ffi) and injects into the builder, calling that runtime's own fork lifecycle functions directly. This keeps the runtime an implementation detail of the exporter and avoids duplicating the fork hooks across FFI surfaces.
Commit the Cargo.lock entry for the tokio-util dependency that backs the cancellation token type, drop the now-unused tokio dependency (the select-based cancellation moved into libdd-data-pipeline), and document in the README why tokio-util is a dependency.
Describe how cancellation cooperatively aborts an in-flight send, what happens when the token is not currently used by a send and when the send has already finished, and warn that cancelling may drop the trace chunks being sent. Drop the redundant Safety notes that restated guarantees already enforced by the type-checked arguments.
The trace-exporter redesign on this branch drives fork safety through the generic shared-runtime FFI (ddog_shared_runtime_new / _free / _before_fork / _after_fork_parent / _after_fork_child), but those symbols aren't shipped today: the builder compiles one aggregate cdylib from libdd-profiling-ffi's dependency graph, which doesn't include libdd-shared-runtime-ffi. Shipping it as a SEPARATE library was tried and fails at runtime with a SIGSEGV, because each library statically embeds its own copy of libdd_shared_runtime and a SharedRuntime handle created in one library cannot be driven from the other. Evidence and analysis: DataDog/libdatadog-rb#52 (comment) Co-locate the FFI in the single profiling library so there is exactly one copy of the libdd_shared_runtime crate at runtime. cargo unifies it with the copy already pulled in via data-pipeline. The change is additive and opt-in behind a new `shared-runtime` feature, so other consumers are unaffected unless enabled. - libdd-profiling-ffi: add optional libdd-shared-runtime-ffi dependency behind a `shared-runtime` feature, re-export its symbols, and propagate cbindgen. - builder: add a `shared-runtime` feature (enabled by default), push the corresponding libdd-profiling-ffi feature in release.rs, and add shared-runtime.h to the header dedup/copy in profiling.rs.
Co-authored-by: vianney <vianney.ruhlmann@datadoghq.com>
# Release proposal for libdd-data-pipeline and its dependencies This PR contains version bumps based on public API changes and commits since last release. ## libdd-trace-utils **Next version:** `8.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-utils-v8.0.0` ### Commits - refactor(vecmap)!: avoid Clone bound from dedup (#2069) ## libdd-telemetry **Next version:** `5.0.1` **Semver bump:** `patch` **Tag:** `libdd-telemetry-v5.0.1` ### Commits - fix(libdd-telemetry): serialize Method::Other as "*" per OpenAPI spec (#1998) ## libdd-trace-obfuscation **Next version:** `4.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-obfuscation-v4.0.0` ###⚠️ major bump forced due to: - `libdd-trace-utils`: ^5.0.0 → ^7.0.0 ### Commits - ci: bump msrv to 1.87.0 (#2017) ## libdd-trace-stats **Next version:** `5.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-stats-v5.0.0` ###⚠️ major bump forced due to: - `libdd-trace-utils`: ^5.0.0 → ^7.0.0 ### Commits - revert!: add from_string to span text (#2011) (#2073) - fix(send_with_retry): follow max retries of the strategy (#2047) - ci: bump msrv to 1.87.0 (#2017) ## libdd-data-pipeline **Next version:** `6.0.0` **Semver bump:** `major` **Tag:** `libdd-data-pipeline-v6.0.0` ###⚠️ major bump forced due to: - `libdd-trace-utils`: ^5.0.0 → ^7.0.0 ### Commits - feat(data-pipeline)!: add fork safety hooks and cancellation token for trace exporter FFI (#2051) - revert!: add from_string to span text (#2011) (#2073) - feat(data-pipeline): move the async boundary up (#2064) - feat(trace-exporter): add fail-closed fallback to v04 (#2037) - fix(send_with_retry): follow max retries of the strategy (#2047) - refactor(trace-utils): replace use_v05_format bool and remove infallible expect (#1946) - ci: bump msrv to 1.87.0 (#2017) ## libdd-sampling (manually bumped as it wasn't included in the proposal) **Next version:** `4.0.0` **Semver bump:** `major` **Tag:** `libdd-sampling-v4.0.0` ###⚠️ major bump forced due to: - `libdd-trace-utils`: ^7.0.0 → ^8.0.0 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: iunanua <18325288+iunanua@users.noreply.github.com> Co-authored-by: iunanua <igor.unanua@datadoghq.com>
What does this PR do?
Add fork safety and cooperative cancellation support to the trace exporter FFI:
Fork hooks: Expose
ddog_trace_exporter_before_fork,_after_fork_in_parent,and
_after_fork_in_childthat delegate to the underlyingSharedRuntime. Theseallow C callers to coordinate the tokio runtime lifecycle around process forks.
Cancellation token: Introduce
ddog_trace_exporter_cancel_token_new,_cancel,and
_dropfor managing cancellation tokens. Thread the token intoddog_trace_exporter_send_trace_chunksviatokio::select!so callers cancooperatively abort in-flight HTTP requests.
Motivation
FUP to #1952.
Ruby application servers (Puma, Unicorn, Passenger) fork worker processes. The tokio
runtime does not survive
fork()— its threads are not carried over. Without the forkhooks, the trace exporter is dead in child processes.
The existing
RUBY_UBF_IOunblock function on the dd-trace-rb side sends a signalthat cannot actually cancel an in-flight Rust HTTP request. The cancellation token
enables cooperative abort: when Ruby interrupts a thread (shutdown,
Thread#kill),the UBF cancels the token, which causes
tokio::select!to abort the send and returnpromptly.
Additional Notes
pub fn shared_runtime()accessor toTraceExportersince the field was private.Handle<CancellationToken>fromlibdd-common-ffi, samepattern as profiling-ffi, with a
ddog_TraceExporterCancelTokencbindgen rename toavoid symbol conflicts.
cancelparameter onsend_trace_chunksis nullable; passingNULLpreservesexisting behavior.
AI was used to accelerate implementation; all code was reviewed and understood.
How to test the change?
46 tests pass (40 existing + 6 new for fork hooks and cancellation token).