Add native trace exporter#5690
Conversation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8e863833d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d8e8638 to
6a597db
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
I did a pass on the C code! Didn't have time to go through the tests
| const uint8_t *body_ptr = | ||
| ddog_trace_exporter_response_get_body(response, &body_len); |
There was a problem hiding this comment.
This looks like it could use a ddog_ByteSlice to avoid the awkward splitting between pointer and length.
| if (pending_exception) { | ||
| if (send_err != NULL) ddog_trace_exporter_error_free(send_err); | ||
| rb_jump_tag(pending_exception); | ||
| } | ||
|
|
||
| if (send_err != NULL) { | ||
| ddog_TraceExporterErrorCode code = send_err->code; | ||
| ddog_trace_exporter_error_free(send_err); |
There was a problem hiding this comment.
It looks like we only want the send_err for the code -- maybe doing that conversion inside send_chunks_without_gvl would simplify the logic here? (Nothing to free anymore...)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5fa393cc9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /* 2. Convert scalars */ | ||
| ddog_CharSlice name_s = char_slice_from_ruby_string(rb_name); | ||
| ddog_CharSlice service_s = nullable_char_slice(rb_service); | ||
| ddog_CharSlice resource_s = char_slice_from_ruby_string(rb_resource); |
There was a problem hiding this comment.
Preserve nil resources when converting spans
If a span is created with resource: nil (the public Span initializer explicitly allows nil as a placeholder, and the existing Ruby serializer emits it), this conversion raises a TypeError before the batch can be sent because char_slice_from_ruby_string requires a String. The native exporter should treat @resource as nullable or otherwise mirror the current serialization path so valid spans do not fail export.
Useful? React with 👍 / 👎.
# What does this PR do? Add span-building and trace-sending FFI functions to `libdd-data-pipeline-ffi`, enabling language tracers to construct spans field-by-field through the C API instead of passing pre-serialized msgpack. New types: - `TracerSpan`: opaque handle wrapping `Span<BytesData>` - `TracerTraceChunks`: opaque handle wrapping `Vec<Vec<SpanBytes>>` New functions: - `ddog_tracer_span_new`, `_free`, `_set_meta`, `_set_metric` - `ddog_tracer_trace_chunks_new`, `_free`, `_begin_chunk`, `_push_span` - `ddog_trace_exporter_send_trace_chunks` # Motivation The Ruby tracer needs a native trace export path that bypasses Ruby-side msgpack serialization. This provides the FFI surface for it; the corresponding C extension in `dd-trace-rb` calls these functions. Continuation of the prototype in #1661, reworked to stay close to `main`: the generic `TraceExporter<H>` and `SharedRuntime` are untouched; this is purely additive. # Additional Notes Spans are consumed on push, chunks are consumed on send — single-ownership enforced at the API level. `libdd-trace-utils` is promoted from dev-dependency to regular dependency since the FFI crate now needs `SpanBytes` at build time. Sibling PR for Ruby is at: DataDog/dd-trace-rb#5690 # How to test the change? `cargo test -p libdd-data-pipeline-ffi --lib` — 39 tests, all passing. The existing `trace_exporter` tests are unaffected. Co-authored-by: edmund.kump <edmund.kump@datadoghq.com>
Introduce the `Datadog::Tracing::Transport::Native::TracerSpan` class backed by libdatadog's `ddog_TracerSpan` FFI. `TracerSpan._native_from_span(span)` reads instance variables directly from a `Datadog::Tracing::Span` (`@name`, `@service`, `@meta`, etc.) and constructs a Rust-heap-allocated span via the C FFI, including: - Scalar fields (name, service, resource, type, IDs, timestamps) - 128-bit trace ID split via `trace_id_t` struct into (low, high) 64-bit halves - String tags (meta) and numeric tags (metrics) via safe hash iteration that stashes errors instead of raising inside `rb_hash_foreach` This is the first part of the trace exporter C extension; `TraceExporter` and `Response` classes will follow.
Introduce the `Datadog::Tracing::Transport::Native::TraceExporter` class backed by libdatadog's `ddog_TraceExporter` FFI. `TraceExporter._native_new(url, tracer_version, language, ...)` builds a Rust `TraceExporter` through the config/build pattern: - Validates argument types upfront (before allocating Rust resources) - Creates a `ddog_TraceExporterConfig`, applies each setting via `SET_CONFIG` macro with cleanup-on-error - Builds the exporter and wraps it in Ruby `TypedData` The `TraceExporter` is freed via `ddog_trace_exporter_free` on GC. `Response` and `_native_send_traces` will follow.
Introduce `Datadog::Tracing::Transport::Native::Response` with the full set of predicate methods expected by the `Writer`: - `ok?`, `internal_error?`, `server_error?`, `client_error?`, `not_found?`, `unsupported?` - `trace_count` and `payload` (agent response body for sampling rate feedback) Two C helpers build responses from the native side: - `create_ok_response`: success with optional payload - `create_error_response`: classifies `ddog_TraceExporterErrorCode` into client/server/internal error categories These will be used by `_native_send_traces` in the next step.
Implement `TraceExporter#_native_send_traces(traces)` which takes an `Array<Array<Span>>`, converts each span to Rust via the C FFI, groups them into trace chunks, and sends them to the Datadog Agent. Key safety patterns: - `rb_ensure` guarantees Rust-allocated trace chunks are freed if a Ruby exception fires during span conversion - `rb_thread_call_without_gvl2` releases the GVL during the blocking network send so other Ruby threads can run - The "gvl2" variant is used (not plain `gvl`) to prevent automatic interrupt raising before Rust response/error objects are inspected and freed - A retry loop handles the case where an interrupt causes `gvl2` to return before the send function actually runs The agent's response body (containing `rate_by_service` for sampling feedback) is extracted and surfaced via `Response#payload`. This completes the C extension for the native trace transport.
Classes created with `rb_define_class_under` are attached to their parent module's constant table, making them reachable by GC. Marking them as GC roots with `rb_global_variable` is unnecessary.
Delegate to the existing helper instead of duplicating the `RSTRING_PTR`/`RSTRING_LEN` construction and type check inline.
Use a typed function pointer and a `static inline` function instead of a preprocessor macro. Same codegen, easier to debug and type-check.
Adopt the same pattern used by the profiling extension: a `process_pending_interruptions` callback paired with a `check_if_pending_exception` wrapper that returns the pending state as an integer instead of requiring manual `rb_protect` at the call site.
Replace the `ddog_TraceExporterResponse **response_out` double pointer with a `ddog_TraceExporterResponse *response` field and pass its address to the FFI call. This removes one level of indirection and keeps all send-related state within the args struct.
Use a descriptive variable name for the cast data pointer.
Move error handling into `send_chunks_without_gvl`: extract the error code enum and free the `ddog_TraceExporterError` before returning. After reacquiring the GVL only the plain `error_code` and `failed` flag remain, eliminating the need to free Rust-allocated error objects on every code path.
Accept keyword arguments via `rb_scan_args(argc, argv, "0:", ...)` instead of 9 positional VALUE parameters. This makes call sites self-documenting and prevents silent argument ordering mistakes. The Ruby call becomes: TraceExporter._native_new(url:, tracer_version:, language:, ...)
Define Response in Ruby with keyword-argument `initialize` and `attr_reader` accessors. The C extension loads the class at init time via `rb_require` and calls `Response.new(ok:, ...)` through `rb_funcallv_kw` instead of manually setting instance variables. This gives Ruby's JIT better visibility into the object shape and reduces the amount of C code.
Count entries skipped due to unexpected types during hash iteration and log a warning after the loop completes. Previously these entries were silently dropped, making it difficult to diagnose missing tags.
Replace two `rb_funcall` calls (`to_i` + `nsec`) with a single `rb_time_timespec` call that extracts the underlying `struct timespec` directly from the Ruby Time object. Zero method dispatch overhead.
Replace two `rb_funcall` calls (`&` and `>>`) with direct extraction via `rb_big_pack`, which copies the Bignum magnitude into a word buffer without method dispatch or temporary object allocation. A Fixnum fast path handles the common case of 64-bit trace IDs with a simple `FIX2LONG` cast.
Construct a `ddog_TracerSpanFields` struct with designated initializers and pass it by reference to `ddog_tracer_span_new`. This matches the updated libdatadog FFI signature and is self-documenting at the call site.
Hoist `span_count` before `ddog_tracer_trace_chunks_begin_chunk` and pass it as the capacity argument to pre-allocate the inner span vector.
The error path is unreachable in practice (the handle is always valid), but the return value must still be freed to avoid leaking the Rust-allocated error object.
d5fa393 to
bd141e3
Compare
vpellan
left a comment
There was a problem hiding this comment.
Approving so that you're not stuck with the sdk-capabilities review requirement as I'm on PTO for the next 3 weeks
What does this PR do?
Add the
Datadog::Tracing::Transport::NativeC extension module with three classes:TracerSpan: converts a RubySpanto a Rust-heap-allocatedddog_TracerSpanby reading instance variables directly (@name,@service,@meta, etc.)TraceExporter: wraps the Rustddog_TraceExporterwith config/build lifecycleResponse: ivar-backed response matching theWriter's expected interface (ok?,internal_error?,server_error?,client_error?,payload, etc.)TraceExporter#_native_send_traces(traces)takesArray<Array<Span>>, converts spans, groups them into trace chunks, and sends them to the Datadog Agent. GVL is released during the network call;rb_ensureguarantees Rust-allocated chunks are freed on exception.Motivation:
First step towards replacing the pure-Ruby msgpack +
Net::HTTPtrace export path with the Rust data pipeline. This provides the C extension; a Ruby transport class wiring it intoWriterwill follow.Companion to DataDog/libdatadog#1952 which adds the FFI surface this extension calls.
Change log entry
None (not yet wired into the tracer; no user-visible change).
Additional Notes:
Built on top of review feedback from the prototype PRs (#5422 / DataDog/libdatadog#1661):
trace_id_tstruct instead of two bareuint64_tout-paramshash_iter_ctxfor safe error stashing duringrb_hash_foreachrb_thread_call_without_gvl2(not plaingvl) to prevent interrupt-before-cleanupclient_error?/server_error?/internal_error?classification fromddog_TraceExporterErrorCodepayloadsurfaces the agent body for sampling rate feedbackAI was used to accelerate implementation; all code was reviewed and understood.
How to test the change?
48 RSpec examples under
spec/datadog/tracing/transport/native/:Requires
libdatadoggem built from the companion PR's branch (viagem "libdatadog", path: "../libdatadog-rb"in Gemfile).