feat(civisibility): Bazel offline cache and payload-file modes#4503
Conversation
- Add unified test optimization env vars and mode resolver with Bazel manifest lookup - Read settings/known/skippable/test-management from cache/http in manifest mode with no network fallback - Disable git upload paths and selected CI enrichment in offline/file modes - Write test-cycle and coverage payloads as JSON files in payload-files mode - Add/extend unit tests for mode resolution, cache-first APIs, payload writing, and tag stripping
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 9fe65ea | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
BenchmarksBenchmark execution time: 2026-04-16 09:06:12 Comparing candidate commit 9fe65ea in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 269 metrics, 8 unstable metrics.
|
- Add deterministic seams and reset helpers for offline/file mode tests - Cover missing output-dir failures for test-cycle and coverage payload files - Assert payload-file mode skips git enrichment and offline init skips upload/logs - Fix malformed manifest settings cache handling discovered by the new tests
- Apply local import grouping expected by Static Checks - Reformat the affected CI visibility test files without changing behavior Tests: go test -count=1 ./internal/civisibility/utils/net/... && go test -count=1 ./internal/civisibility/integrations/... && go test -count=1 ./ddtrace/tracer -run 'TestCiVisibilityTransportPayloadFilesMode'
- Narrow the offline log test to a direct helper instead of re-running global CI visibility init - Stop resetting package-wide init state in the new integration test helper - Keep the offline/upload assertions while preserving manual_api_mocktracer test stability Tests: INTEGRATION=true go test -shuffle=on -count=20 ./internal/civisibility/integrations && INTEGRATION=true go test -count=1 ./internal/civisibility/integrations && go test -count=1 ./internal/civisibility/utils/net/... && go test -count=1 ./ddtrace/tracer -run 'TestCiVisibilityTransportPayloadFilesMode'
mtoffl01
left a comment
There was a problem hiding this comment.
Approving on behalf of sdk-capabilities
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
## Description Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring the Go implementation in DataDog/dd-trace-go#4503: - **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, and test management data from pre-fetched JSON cache files inside `.testoptimization/`, enabling CI Visibility in Bazel's hermetic sandbox without network access. - **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes test event, coverage, and telemetry payloads as JSON files to `TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/` instead of sending HTTP requests. ### Key changes - **`offline_mode.py`**: `OfflineMode` singleton detects and validates both modes from env vars. Manifest version parsing supports plain `"1"` and `version=1` assignment syntax (matching Go). Runfiles resolution via `RUNFILES_DIR`, `RUNFILES_MANIFEST_FILE`, and `TEST_SRCDIR`. - **`cached_file_provider.py`**: `CachedFileDataProvider` implements the `TestOptDataProvider` protocol, reading from cache files. Skippable tests return empty unconditionally (hard no-op in manifest mode, matching Go). - **`writer.py`**: `TestOptWriter` and `TestCoverageWriter` intercept `_send_events` in payload-files mode to write JSON files. Filenames use `{kind}-{timestamp}-{pid}-{seq}.json` pattern matching Go's DDTestRunner expectations. Telemetry files use ordinal-first naming (`telemetry-{seq_padded}-{pid}.json`) for deterministic replay. - **`telemetry.py`**: `TelemetryAPI` accumulates CI Visibility metrics in payload-files mode and writes them to `payloads/telemetry/` on `finish()`, matching Go's telemetry file-sink behavior. - **`session_manager.py`**: Swaps `APIClient` for `CachedFileDataProvider` in manifest mode. Uses `NoOpBackendConnectorSetup` for writers/telemetry. Forces test skipping off in manifest mode. Skips git data upload in both offline modes. - **`env_tags.py`**: In payload-files mode, reads CI/Git tags from `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` instead of invoking git CLI. Falls back to `ci.provider.name = "bazel"` when no other provider is detected. - **`constants.py`**: New `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` constant. ## Testing - Unit tests added/updated across 4 test files covering all new functionality: - `test_offline_mode.py` — manifest version parsing (plain, assignment syntax, blank lines, invalid), runfiles resolution, `OfflineMode` initialization - `test_cached_file_provider.py` — cache reading, skippable tests hard no-op - `test_payload_files.py` — payload file naming (`{kind}-{ts}-{pid}-{seq}.json`), telemetry ordinal naming, telemetry file output on `TelemetryAPI.finish()` - `test_bazel_offline_session_manager.py` — provider selection, git upload skipping, env data file reading, bazel provider fallback, skipping forced off in manifest mode - All 90 tests pass on Python 3.12 with pytest ~=8.0. ## Risks - **Payload file naming**: Changed from `payload_<n>.json` to `{kind}-{ts}-{pid}-{seq}.json` to match Go's DDTestRunner expectations. Any consumer that relied on the old naming would need updating. - Manifest mode disables test skipping unconditionally — this is intentional and matches Go behavior. ## Additional Notes - Mirrors Go PR: DataDog/dd-trace-go#4503 - Features not ported (not applicable to Python's architecture): per-span CI/git tag stripping (Python sets tags at metadata level), impacted tests suppression (no such concept in Python yet), low-level git CLI guard (only called from guarded `upload_git_data`), CI log shipping suppression (Python plugin has no log shipping). Co-authored-by: federico.mon <federico.mon@datadoghq.com>
## Description Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring the Go implementation in DataDog/dd-trace-go#4503: - **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, and test management data from pre-fetched JSON cache files inside `.testoptimization/`, enabling CI Visibility in Bazel's hermetic sandbox without network access. - **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes test event, coverage, and telemetry payloads as JSON files to `TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/` instead of sending HTTP requests. ### Key changes - **`offline_mode.py`**: `OfflineMode` singleton detects and validates both modes from env vars. Manifest version parsing supports plain `"1"` and `version=1` assignment syntax (matching Go). Runfiles resolution via `RUNFILES_DIR`, `RUNFILES_MANIFEST_FILE`, and `TEST_SRCDIR`. - **`cached_file_provider.py`**: `CachedFileDataProvider` implements the `TestOptDataProvider` protocol, reading from cache files. Skippable tests return empty unconditionally (hard no-op in manifest mode, matching Go). - **`writer.py`**: `TestOptWriter` and `TestCoverageWriter` intercept `_send_events` in payload-files mode to write JSON files. Filenames use `{kind}-{timestamp}-{pid}-{seq}.json` pattern matching Go's DDTestRunner expectations. Telemetry files use ordinal-first naming (`telemetry-{seq_padded}-{pid}.json`) for deterministic replay. - **`telemetry.py`**: `TelemetryAPI` accumulates CI Visibility metrics in payload-files mode and writes them to `payloads/telemetry/` on `finish()`, matching Go's telemetry file-sink behavior. - **`session_manager.py`**: Swaps `APIClient` for `CachedFileDataProvider` in manifest mode. Uses `NoOpBackendConnectorSetup` for writers/telemetry. Forces test skipping off in manifest mode. Skips git data upload in both offline modes. - **`env_tags.py`**: In payload-files mode, reads CI/Git tags from `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` instead of invoking git CLI. Falls back to `ci.provider.name = "bazel"` when no other provider is detected. - **`constants.py`**: New `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` constant. ## Testing - Unit tests added/updated across 4 test files covering all new functionality: - `test_offline_mode.py` — manifest version parsing (plain, assignment syntax, blank lines, invalid), runfiles resolution, `OfflineMode` initialization - `test_cached_file_provider.py` — cache reading, skippable tests hard no-op - `test_payload_files.py` — payload file naming (`{kind}-{ts}-{pid}-{seq}.json`), telemetry ordinal naming, telemetry file output on `TelemetryAPI.finish()` - `test_bazel_offline_session_manager.py` — provider selection, git upload skipping, env data file reading, bazel provider fallback, skipping forced off in manifest mode - All 90 tests pass on Python 3.12 with pytest ~=8.0. ## Risks - **Payload file naming**: Changed from `payload_<n>.json` to `{kind}-{ts}-{pid}-{seq}.json` to match Go's DDTestRunner expectations. Any consumer that relied on the old naming would need updating. - Manifest mode disables test skipping unconditionally — this is intentional and matches Go behavior. ## Additional Notes - Mirrors Go PR: DataDog/dd-trace-go#4503 - Features not ported (not applicable to Python's architecture): per-span CI/git tag stripping (Python sets tags at metadata level), impacted tests suppression (no such concept in Python yet), low-level git CLI guard (only called from guarded `upload_git_data`), CI log shipping suppression (Python plugin has no log shipping). Co-authored-by: federico.mon <federico.mon@datadoghq.com>
# What Does This Do Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring [dd-trace-go#4503](DataDog/dd-trace-go#4503) and [dd-trace-py#17197](DataDog/dd-trace-py#17197): - **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, flaky tests, and test management data from pre-fetched JSON cache files instead of hitting the backend. - **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes CI test cycle, coverage, and tracer telemetry to `$TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/*.json` instead of POSTing them. ## Key Changes - `BazelMode` (internal-api): detects both modes, resolves the manifest path via Bazel's rlocation algorithm, parses the `version=<int>` header, and exposes the `tests/`, `coverage/`, and `telemetry/` output directories. - `FileBasedConfigurationApi` (agent-ci-visibility): reads the same JSON envelopes as the HTTP API from disk; null paths return safe defaults. - `FileBasedPayloadDispatcher` (dd-trace-core): serializes CI test cycle and coverage spans as JSON files; strips `ci.*`/`git.*`/`runtime.*`/`os.*` tags to avoid cache invalidation; atomic temp-file + rename. Writes `trace_id`/`span_id`/`parent_id` as unsigned 64-bit JSON numbers (not strings) so backend schema validation passes. - `FileBasedTelemetryClient` (telemetry): subclass of `TelemetryClient` that writes the existing Moshi-encoded telemetry request body to a file; `TelemetryRouter` gets a single-client path that skips feature discovery; `TelemetrySystem` swaps in the file-based client when Bazel mode is active. - `WriterFactory` / `CiVisibilityServices` / `CiVisibilityRepoServices`: wire the file-based dispatcher/config API, disable the git client, and skip git-data upload when Bazel mode is active. - `CoreTracer`: in Bazel payload-files mode, uses `StreamingTraceCollector` (streams each CI Visibility span individually) and `DDIntakeTraceInterceptor` (not the APM-protocol interceptor, which strips `test_{session,module,suite}_end` spans) — same treatment as agentless, so all CITESTCYCLE events reach the file dispatcher. - `JUnit4TracingListener` / `JUnit4Utils`: lazy-register the test suite in `testStarted` so runners that don't fire `testSuiteStarted` still produce a proper suite span; unwrap `com.google.testing.junit.junit4.runner.RunNotifierWrapper` in `runListenersFromRunNotifier` so the idempotency check sees listeners installed on the inner notifier (fixes duplicate-listener installation under `BazelTestRunner`). - `Config`: adds `DD_TEST_OPTIMIZATION_MANIFEST_FILE` and `DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`; skips API-key validation in these modes. `TEST_UNDECLARED_OUTPUTS_DIR` is read directly in `BazelMode` (it's a Bazel-provided env var, not a DD config). # Motivation Bazel can run tests in hermetic sandboxes with no network access. The existing CI Visibility pipeline requires HTTP calls to fetch configuration and submit payloads, which is incompatible with Bazel's execution model. Most of our operations, such as tagging tests with git metadata, also invalid Bazel's cache. This PR enables CI Visibility under Bazel by reading configuration from pre-fetched cache files and writing payloads/telemetry to files, with the orchestration of everything else being handled by our custom testing rule. # Additional Notes - Unit tests cover each new component: `BazelModeTest`, `FileBasedConfigurationApiTest` (shares the existing `*-response.ftl` fixtures with `ConfigurationApiImplTest` to keep the HTTP and file code paths in sync), `FileBasedPayloadDispatcherTest`, `FileBasedTelemetryClientTest`, and extended `TelemetryRouterSpecification`. - End-to-end repro validated locally against `DataDog/rules_test_optimization_tests`: 3 `test` + 1 `test_suite_end` + 1 `test_module_end` + 1 `test_session_end` events emitted to the payload file, no duplicate listener errors, no schema-validation failures. - Future work, not included in this PR to avoid changes too big: - Include instrumentation improvements to better handle Bazel's custom JUnit4 test runner - Refactoring of configuration API related DTOs to common utilities - Define a specification for mapping and serializing CI Vis spans to avoid logic mirroring between the two approaches (original vs file-based) - Possibly introduce a smoke test for e2e testing of bazel process, to avoid dependencies on an external repository. # Contributor Checklist - [x] Format the title according to [the contribution guidelines](https://github.com/DataDog/dd-trace-java/blob/master/CONTRIBUTING.md#title-format) - [ ] Assign the `type:` and (`comp:` or `inst:`) labels in addition to [any other useful labels](https://github.com/DataDog/dd-trace-java/blob/master/CONTRIBUTING.md#labels) - [ ] Update the [CODEOWNERS](https://github.com/DataDog/dd-trace-java/blob/master/.github/CODEOWNERS) file on source file addition, migration, or deletion - [ ] Update [public documentation](https://docs.datadoghq.com/tracing/trace_collection/library_config/java/) with any new configuration flags or behaviors Jira ticket: [SDTEST-3335] ***Note:*** **Once your PR is ready to merge, add it to the merge queue by commenting \`/merge\`.** \`/merge -c\` cancels the queue request. \`/merge -f --reason "reason"\` skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see [this doc](https://datadoghq.atlassian.net/wiki/spaces/DEVX/pages/3121612126/MergeQueue). [SDTEST-3335]: https://datadoghq.atlassian.net/browse/SDTEST-3335?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: daniel.mohedano <daniel.mohedano@datadoghq.com>
## Description Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring the Go implementation in DataDog/dd-trace-go#4503: - **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, and test management data from pre-fetched JSON cache files inside `.testoptimization/`, enabling CI Visibility in Bazel's hermetic sandbox without network access. - **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes test event, coverage, and telemetry payloads as JSON files to `TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/` instead of sending HTTP requests. ### Key changes - **`offline_mode.py`**: `OfflineMode` singleton detects and validates both modes from env vars. Manifest version parsing supports plain `"1"` and `version=1` assignment syntax (matching Go). Runfiles resolution via `RUNFILES_DIR`, `RUNFILES_MANIFEST_FILE`, and `TEST_SRCDIR`. - **`cached_file_provider.py`**: `CachedFileDataProvider` implements the `TestOptDataProvider` protocol, reading from cache files. Skippable tests return empty unconditionally (hard no-op in manifest mode, matching Go). - **`writer.py`**: `TestOptWriter` and `TestCoverageWriter` intercept `_send_events` in payload-files mode to write JSON files. Filenames use `{kind}-{timestamp}-{pid}-{seq}.json` pattern matching Go's DDTestRunner expectations. Telemetry files use ordinal-first naming (`telemetry-{seq_padded}-{pid}.json`) for deterministic replay. - **`telemetry.py`**: `TelemetryAPI` accumulates CI Visibility metrics in payload-files mode and writes them to `payloads/telemetry/` on `finish()`, matching Go's telemetry file-sink behavior. - **`session_manager.py`**: Swaps `APIClient` for `CachedFileDataProvider` in manifest mode. Uses `NoOpBackendConnectorSetup` for writers/telemetry. Forces test skipping off in manifest mode. Skips git data upload in both offline modes. - **`env_tags.py`**: In payload-files mode, reads CI/Git tags from `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` instead of invoking git CLI. Falls back to `ci.provider.name = "bazel"` when no other provider is detected. - **`constants.py`**: New `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` constant. ## Testing - Unit tests added/updated across 4 test files covering all new functionality: - `test_offline_mode.py` — manifest version parsing (plain, assignment syntax, blank lines, invalid), runfiles resolution, `OfflineMode` initialization - `test_cached_file_provider.py` — cache reading, skippable tests hard no-op - `test_payload_files.py` — payload file naming (`{kind}-{ts}-{pid}-{seq}.json`), telemetry ordinal naming, telemetry file output on `TelemetryAPI.finish()` - `test_bazel_offline_session_manager.py` — provider selection, git upload skipping, env data file reading, bazel provider fallback, skipping forced off in manifest mode - All 90 tests pass on Python 3.12 with pytest ~=8.0. ## Risks - **Payload file naming**: Changed from `payload_<n>.json` to `{kind}-{ts}-{pid}-{seq}.json` to match Go's DDTestRunner expectations. Any consumer that relied on the old naming would need updating. - Manifest mode disables test skipping unconditionally — this is intentional and matches Go behavior. ## Additional Notes - Mirrors Go PR: DataDog/dd-trace-go#4503 - Features not ported (not applicable to Python's architecture): per-span CI/git tag stripping (Python sets tags at metadata level), impacted tests suppression (no such concept in Python yet), low-level git CLI guard (only called from guarded `upload_git_data`), CI log shipping suppression (Python plugin has no log shipping). Co-authored-by: federico.mon <federico.mon@datadoghq.com>
Summary
This PR adds Bazel-focused CI Visibility support in
dd-trace-gowith two offline execution modes:DD_TEST_OPTIMIZATION_MANIFEST_FILE, with strict manifest resolution and cache-only reads for supported endpoints.DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES, writing JSON payload envelopes into Bazel undeclared outputs underTEST_UNDECLARED_OUTPUTS_DIR.The goal is to let CI Visibility operate in Bazel environments without relying on the usual online Git and payload transport paths, and to make the same payload-file flow available to shared instrumentation telemetry.
Main changes
internal/bazelpackage.internal/telemetry, writing raw top-level telemetry payloads underpayloads/telemetry.Tests
Added or extended tests for:
internal/bazel.Validation
Focused validation run for this work:
go test ./internal/bazelgo test ./internal/telemetry/...go test -race ./internal/telemetry/...go test ./internal/civisibility/...go test -race ./internal/civisibility/...go test ./ddtrace/tracer -run 'TestTelemetryEnabled|TestCiVisibilityTransport|TestCIVisibilityTransportSecureLogging|TestCiVisibilityTransportPayloadFilesModeWritesJSON|TestCiVisibilityTransportPayloadFilesModeMissingOutputDir'go test ./profiler -run 'TestTelemetryEnabled'