feat: add OpenTelemetry distributed tracing support#929
Merged
Conversation
Adds end-to-end OpenTelemetry (OTel) tracing to Sablier so every HTTP request, async instance start, and outbound provider call can be observed in tools like Jaeger, Tempo, or any OTLP-compatible backend. ## What's new ### Configuration A new top-level `tracing` section is available in `sablier.yaml` and as CLI flags / environment variables: ```yaml tracing: enabled: true exporterType: otlphttp # otlphttp | stdout endpoint: http://localhost:4318 serviceName: sablier samplingRate: 1.0 # 1.0 = sample every request ``` | Flag | Env var | Default | |------|---------|---------| | `--tracing.enabled` | `SABLIER_TRACING_ENABLED` | `false` | | `--tracing.exporter-type` | `SABLIER_TRACING_EXPORTER_TYPE` | `otlphttp` | | `--tracing.endpoint` | `SABLIER_TRACING_ENDPOINT` | `http://localhost:4318` | | `--tracing.service-name` | `SABLIER_TRACING_SERVICE_NAME` | `sablier` | | `--tracing.sampling-rate` | `SABLIER_TRACING_SAMPLING_RATE` | `1.0` | ### HTTP server instrumentation Every incoming request receives an OTel span via the `otelgin` middleware. Trace-ID and span-ID are injected into the structured `slog` access log so log lines and traces can be correlated in a single query. ### Provider instrumentation All outbound provider calls are wrapped with OTel transports: - **Docker / Podman / Swarm** — uses the native `client.WithTraceProvider` option built into the moby client. - **Kubernetes** — wraps the rest-client `RoundTripper` with `otelhttp.NewTransport`. - **ProxmoxLXC** — same `otelhttp.NewTransport` wrapper applied before TLS configuration. ### Async start trace propagation When an incoming request triggers an async `InstanceStart` goroutine the OTel span context from the request is propagated into the background context. Provider API calls in the goroutine appear as children of the original HTTP request's trace without being bound by its deadline or cancellation. When a subsequent request joins the same pending goroutine a `sablier.instance.join_pending_start` span event is recorded on the joining request's span, referencing the original trace- and span-ID so the two requests can be correlated in the tracing backend. ### Webhook instrumentation The webhook dispatcher HTTP client is wrapped with `otelhttp.NewTransport` so outbound webhook deliveries appear in the trace. ### gRPC log bridge gRPC internal logs (channel state changes, resolver updates, etc.) are redirected through `slog` instead of being written directly to stderr. gRPC `Info`-level events are demoted to `slog.Debug` so they remain invisible at the default log level. ## Running the Jaeger example ```sh cd examples/tracing/jaeger make up # starts sablier + whoami + Jaeger all-in-one make request-blocking # send a blocking request to warm up whoami make traces # open the Jaeger UI in your browser ``` ## Files changed - `pkg/config/tracing.go` — new `Tracing` config struct - `pkg/tracing/setup.go` — TracerProvider bootstrap, exporters, propagators - `pkg/tracing/grpclog.go` — gRPC→slog log bridge - `pkg/tracing/setup_test.go` — unit tests for the tracing package - `internal/server/server.go` — `otelgin` middleware on the HTTP router - `internal/server/logging.go` — trace-ID / span-ID in access logs - `internal/server/tracing_test.go` — integration tests for span creation - `pkg/sabliercmd/provider.go` — OTel transport on all providers - `pkg/sabliercmd/start.go` — tracing initialisation and shutdown - `pkg/sabliercmd/root.go` — CLI flags and viper bindings - `pkg/sablier/instance_request.go` — span-context propagation into goroutines - `pkg/webhook/dispatcher.go` — traced HTTP client - `examples/tracing/jaeger/` — Docker Compose example with Jaeger - `docs/tracing.md` — full configuration and provider reference
|
Test Results✅ All tests passed! | 491 tests in 125.791s |
Each provider (Docker, Swarm, Kubernetes, ProxmoxLXC) now starts a
named child span for InstanceStart and InstanceStop, inheriting the
trace context propagated from the HTTP request through the Sablier core.
Span attributes per provider:
- docker.instance.{start,stop}: instance, operation (start/unpause/
pause/stop/scale_mode.noop/scale_mode.apply_resources), cpu, memory
- swarm.instance.{start,stop}: instance, operation, replicas, cpu, memory
- kubernetes.instance.{start,stop}: instance, operation (scale/
scale_mode/scale_to_zero), replicas, cpu, memory
- proxmoxlxc.instance.{start,stop}: instance, proxmox.node, proxmox.vmid,
operation (start/stop/noop.already_running/noop.already_stopped)
Also fix test context matchers: InstanceInspect and InstanceStart now
receive a context enriched with the trace span value; tests that used
gomock.Eq(ctx) for those calls now use gomock.Any().
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Adds end-to-end OpenTelemetry tracing to Sablier so every HTTP request, async instance start, and outbound provider call can be observed in a tool like Jaeger, Grafana Tempo, or any OTLP-compatible backend.
Tracing is opt-in and disabled by default — existing deployments are unaffected.
Supersed #740
Closes #428
Configuration
Add a
tracingsection to yoursablier.yaml:All fields are also available as CLI flags and environment variables:
--tracing.enabledSABLIER_TRACING_ENABLEDfalse--tracing.exporter-typeSABLIER_TRACING_EXPORTER_TYPEotlphttp--tracing.endpointSABLIER_TRACING_ENDPOINThttp://localhost:4318--tracing.service-nameSABLIER_TRACING_SERVICE_NAMEsablier--tracing.sampling-rateSABLIER_TRACING_SAMPLING_RATE1.0What gets traced
HTTP server
Every incoming request receives an OTel span via the
otelginmiddleware. The trace-ID and span-ID are injected into the structuredslogaccess log so log lines and traces can be correlated in a single query.Provider API calls
All outbound provider calls are wrapped with OTel-instrumented transports:
client.WithTraceProviderbuilt into the moby clientotelhttp.NewTransportwrapping the rest-clientRoundTripperotelhttp.NewTransportwrapping the base transport (before TLS)Async instance starts
When a request triggers an async
InstanceStartgoroutine the OTel span context from the triggering request is propagated into the background context, so provider API calls in the goroutine appear as children of the original HTTP request's trace — without being bound by the request's deadline or cancellation.When a subsequent request joins the same pending goroutine a
sablier.instance.join_pending_startspan event is recorded on the joining request's span, referencing the original trace-ID and span-ID so the two requests can be correlated in the tracing backend.Webhook deliveries
The webhook dispatcher HTTP client is wrapped with
otelhttp.NewTransportso outbound webhook HTTP calls appear in the trace.Quieter logs
gRPC internal logs (channel state changes, resolver updates, etc.) are now routed through
sloginstead of being written directly to stderr. gRPCInfo-level events are demoted toslog.Debug, so they remain invisible at the default log level.Quick start with Jaeger
A ready-to-run Docker Compose example is included:
The Jaeger UI will show the full trace: HTTP handler span → async InstanceStart span → Docker API calls.
Files changed
pkg/config/tracing.goTracingconfig struct with defaultspkg/tracing/setup.goTracerProviderbootstrap: exporters, resource, propagators, samplerpkg/tracing/grpclog.gogrpclog.LoggerV2→slogbridge (demotes Info to Debug)pkg/tracing/setup_test.gointernal/server/server.gootelginmiddleware on the HTTP routerinternal/server/logging.gointernal/server/tracing_test.gopkg/sabliercmd/provider.gopkg/sabliercmd/start.gopkg/sabliercmd/root.gopkg/sablier/instance_request.gopkg/webhook/dispatcher.goexamples/tracing/jaeger/docs/tracing.mdsablier.sample.yaml