Skip to content

feat(civisibility): Bazel offline cache and payload-file modes#4503

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 41 commits into
mainfrom
codex/rfc-bazel-topt-files-go
Apr 16, 2026
Merged

feat(civisibility): Bazel offline cache and payload-file modes#4503
gh-worker-dd-mergequeue-cf854d[bot] merged 41 commits into
mainfrom
codex/rfc-bazel-topt-files-go

Conversation

@tonyredondo

@tonyredondo tonyredondo commented Mar 5, 2026

Copy link
Copy Markdown
Member

Summary

This PR adds Bazel-focused CI Visibility support in dd-trace-go with two offline execution modes:

  • Manifest mode via DD_TEST_OPTIMIZATION_MANIFEST_FILE, with strict manifest resolution and cache-only reads for supported endpoints.
  • Payload-files mode via DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES, writing JSON payload envelopes into Bazel undeclared outputs under TEST_UNDECLARED_OUTPUTS_DIR.

The goal is to let CI Visibility operate in Bazel environments without relying on the usual online Git and payload transport paths, and to make the same payload-file flow available to shared instrumentation telemetry.

Main changes

  • Extracted Bazel mode resolution and payload-file helpers into the new shared internal/bazel package.
  • Switched settings, known-tests, and test-management reads to manifest-cache behavior when manifest mode is enabled.
  • Disabled repository upload in manifest and payload-files modes.
  • Disabled impacted-tests flows in payload-files mode.
  • Skipped local Git enrichment in payload-files mode and relied on environmental data files instead.
  • Short-circuited test-cycle and coverage transport paths in payload-files mode to write JSON files instead of sending HTTP payloads.
  • Added shared telemetry payload-file support in internal/telemetry, writing raw top-level telemetry payloads under payloads/telemetry.
  • Added explicit sink tracking in the shared telemetry writer so file-backed writes do not affect HTTP telemetry metrics or flush speed heuristics.
  • Allowed shared telemetry client creation in payload-files mode without requiring HTTP endpoints.
  • Skipped CI log initialization in offline/file-based modes.
  • Disabled Git CLI invocation in payload-files mode in the guarded Git execution paths.

Tests

Added or extended tests for:

  • Bazel mode resolution and manifest parsing in internal/bazel.
  • Manifest-cache endpoint behavior with no HTTP fallback.
  • Payload-file generation and JSON shape for test-cycle and coverage payloads.
  • Shared telemetry file-sink behavior, ordering, and metric handling.
  • Payload-files tag behavior for CI/Git/OS/runtime metadata.
  • Repository-upload suppression in offline/file-based modes.
  • Git CLI disabling on the guarded execution paths.

Validation

Focused validation run for this work:

  • go test ./internal/bazel
  • go test ./internal/telemetry/...
  • go test -race ./internal/telemetry/...
  • go test ./internal/civisibility/...
  • go test -race ./internal/civisibility/...
  • go test ./ddtrace/tracer -run 'TestTelemetryEnabled|TestCiVisibilityTransport|TestCIVisibilityTransportSecureLogging|TestCiVisibilityTransportPayloadFilesModeWritesJSON|TestCiVisibilityTransportPayloadFilesModeMissingOutputDir'
  • go test ./profiler -run 'TestTelemetryEnabled'

- Add unified test optimization env vars and mode resolver with Bazel manifest lookup
- Read settings/known/skippable/test-management from cache/http in manifest mode with no network fallback
- Disable git upload paths and selected CI enrichment in offline/file modes
- Write test-cycle and coverage payloads as JSON files in payload-files mode
- Add/extend unit tests for mode resolution, cache-first APIs, payload writing, and tag stripping
@datadog-official

datadog-official Bot commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 80.45%
Overall Coverage: 61.03%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9fe65ea | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@pr-commenter

pr-commenter Bot commented Mar 5, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-04-16 09:06:12

Comparing candidate commit 9fe65ea in PR branch codex/rfc-bazel-topt-files-go with baseline commit 5e83e21 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 269 metrics, 8 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:BenchmarkOTLPProtoSize/1span-25

  • 🟥 execution_time [+8.088ns; +9.412ns] or [+2.191%; +2.550%]

@codecov

codecov Bot commented Mar 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.43133% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.65%. Comparing base (5e83e21) to head (9fe65ea).

Files with missing lines Patch % Lines
internal/bazel/mode.go 84.00% 19 Missing and 13 partials ⚠️
internal/telemetry/internal/writer.go 59.37% 10 Missing and 3 partials ⚠️
...civisibility/integrations/civisibility_features.go 75.75% 3 Missing and 5 partials ⚠️
...nal/civisibility/integrations/manual_api_ddtest.go 70.83% 5 Missing and 2 partials ⚠️
...ivisibility/utils/net/test_management_tests_api.go 81.08% 4 Missing and 3 partials ⚠️
internal/civisibility/utils/environmentTags.go 89.13% 2 Missing and 3 partials ⚠️
internal/civisibility/utils/net/known_tests_api.go 84.84% 3 Missing and 2 partials ⚠️
internal/civisibility/utils/net/settings_api.go 87.50% 3 Missing and 2 partials ⚠️
internal/civisibility/utils/net/coverage.go 78.94% 2 Missing and 2 partials ⚠️
internal/civisibility/utils/git.go 50.00% 2 Missing and 1 partial ⚠️
... and 3 more
Additional details and impacted files
Files with missing lines Coverage Δ
...nal/civisibility/integrations/manual_api_common.go 85.71% <100.00%> (ø)
...ternal/civisibility/utils/net/searchcommits_api.go 79.48% <100.00%> (ø)
...ternal/civisibility/utils/net/sendpackfiles_api.go 86.20% <100.00%> (ø)
internal/civisibility/utils/net/skippable.go 65.71% <100.00%> (ø)
internal/civisibility/utils/telemetry/telemetry.go 100.00% <ø> (ø)
...al/civisibility/utils/telemetry/telemetry_count.go 4.79% <100.00%> (ø)
internal/telemetry/client_config.go 83.52% <100.00%> (ø)
ddtrace/tracer/civisibility_transport.go 61.79% <77.77%> (ø)
internal/civisibility/integrations/civisibility.go 65.43% <85.71%> (ø)
internal/civisibility/utils/git.go 56.48% <50.00%> (ø)
... and 10 more

... and 423 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add deterministic seams and reset helpers for offline/file mode tests
- Cover missing output-dir failures for test-cycle and coverage payload files
- Assert payload-file mode skips git enrichment and offline init skips upload/logs
- Fix malformed manifest settings cache handling discovered by the new tests
- Apply local import grouping expected by Static Checks
- Reformat the affected CI visibility test files without changing behavior

Tests: go test -count=1 ./internal/civisibility/utils/net/... && go test -count=1 ./internal/civisibility/integrations/... && go test -count=1 ./ddtrace/tracer -run 'TestCiVisibilityTransportPayloadFilesMode'
- Narrow the offline log test to a direct helper instead of re-running global CI visibility init
- Stop resetting package-wide init state in the new integration test helper
- Keep the offline/upload assertions while preserving manual_api_mocktracer test stability

Tests: INTEGRATION=true go test -shuffle=on -count=20 ./internal/civisibility/integrations && INTEGRATION=true go test -count=1 ./internal/civisibility/integrations && go test -count=1 ./internal/civisibility/utils/net/... && go test -count=1 ./ddtrace/tracer -run 'TestCiVisibilityTransportPayloadFilesMode'
Comment thread internal/civisibility/integrations/civisibility.go Outdated
Comment thread internal/civisibility/integrations/civisibility.go Outdated
Comment thread internal/civisibility/integrations/civisibility_features.go
Comment thread internal/civisibility/utils/net/skippable.go Outdated
Comment thread internal/civisibility/utils/net/skippable_test.go Outdated
Comment thread internal/civisibility/utils/environmentTags.go Outdated
@tonyredondo tonyredondo marked this pull request as ready for review April 6, 2026 15:35
@tonyredondo tonyredondo requested review from a team as code owners April 6, 2026 15:35

@mtoffl01 mtoffl01 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on behalf of sdk-capabilities

@darccio

darccio commented Apr 16, 2026

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@darccio darccio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tonyredondo

Copy link
Copy Markdown
Member Author

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Apr 16, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-04-16 09:18:45 UTC ℹ️ Start processing command /merge


2026-04-16 09:18:50 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 16m (p90).


2026-04-16 09:32:53 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 194346a into main Apr 16, 2026
219 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the codex/rfc-bazel-topt-files-go branch April 16, 2026 09:32
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/dd-trace-py that referenced this pull request Apr 17, 2026
## Description

Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring the Go implementation in DataDog/dd-trace-go#4503:

- **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, and test management data from pre-fetched JSON cache files inside `.testoptimization/`, enabling CI Visibility in Bazel's hermetic sandbox without network access.
- **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes test event, coverage, and telemetry payloads as JSON files to `TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/` instead of sending HTTP requests.

### Key changes

- **`offline_mode.py`**: `OfflineMode` singleton detects and validates both modes from env vars. Manifest version parsing supports plain `"1"` and `version=1` assignment syntax (matching Go). Runfiles resolution via `RUNFILES_DIR`, `RUNFILES_MANIFEST_FILE`, and `TEST_SRCDIR`.
- **`cached_file_provider.py`**: `CachedFileDataProvider` implements the `TestOptDataProvider` protocol, reading from cache files. Skippable tests return empty unconditionally (hard no-op in manifest mode, matching Go).
- **`writer.py`**: `TestOptWriter` and `TestCoverageWriter` intercept `_send_events` in payload-files mode to write JSON files. Filenames use `{kind}-{timestamp}-{pid}-{seq}.json` pattern matching Go's DDTestRunner expectations. Telemetry files use ordinal-first naming (`telemetry-{seq_padded}-{pid}.json`) for deterministic replay.
- **`telemetry.py`**: `TelemetryAPI` accumulates CI Visibility metrics in payload-files mode and writes them to `payloads/telemetry/` on `finish()`, matching Go's telemetry file-sink behavior.
- **`session_manager.py`**: Swaps `APIClient` for `CachedFileDataProvider` in manifest mode. Uses `NoOpBackendConnectorSetup` for writers/telemetry. Forces test skipping off in manifest mode. Skips git data upload in both offline modes.
- **`env_tags.py`**: In payload-files mode, reads CI/Git tags from `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` instead of invoking git CLI. Falls back to `ci.provider.name = "bazel"` when no other provider is detected.
- **`constants.py`**: New `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` constant.

## Testing

- Unit tests added/updated across 4 test files covering all new functionality:
  - `test_offline_mode.py` — manifest version parsing (plain, assignment syntax, blank lines, invalid), runfiles resolution, `OfflineMode` initialization
  - `test_cached_file_provider.py` — cache reading, skippable tests hard no-op
  - `test_payload_files.py` — payload file naming (`{kind}-{ts}-{pid}-{seq}.json`), telemetry ordinal naming, telemetry file output on `TelemetryAPI.finish()`
  - `test_bazel_offline_session_manager.py` — provider selection, git upload skipping, env data file reading, bazel provider fallback, skipping forced off in manifest mode
- All 90 tests pass on Python 3.12 with pytest ~=8.0.

## Risks

- **Payload file naming**: Changed from `payload_<n>.json` to `{kind}-{ts}-{pid}-{seq}.json` to match Go's DDTestRunner expectations. Any consumer that relied on the old naming would need updating.
- Manifest mode disables test skipping unconditionally — this is intentional and matches Go behavior.

## Additional Notes

- Mirrors Go PR: DataDog/dd-trace-go#4503
- Features not ported (not applicable to Python's architecture): per-span CI/git tag stripping (Python sets tags at metadata level), impacted tests suppression (no such concept in Python yet), low-level git CLI guard (only called from guarded `upload_git_data`), CI log shipping suppression (Python plugin has no log shipping).

Co-authored-by: federico.mon <federico.mon@datadoghq.com>
dubloom pushed a commit to DataDog/dd-trace-py that referenced this pull request Apr 21, 2026
## Description

Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring the Go implementation in DataDog/dd-trace-go#4503:

- **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, and test management data from pre-fetched JSON cache files inside `.testoptimization/`, enabling CI Visibility in Bazel's hermetic sandbox without network access.
- **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes test event, coverage, and telemetry payloads as JSON files to `TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/` instead of sending HTTP requests.

### Key changes

- **`offline_mode.py`**: `OfflineMode` singleton detects and validates both modes from env vars. Manifest version parsing supports plain `"1"` and `version=1` assignment syntax (matching Go). Runfiles resolution via `RUNFILES_DIR`, `RUNFILES_MANIFEST_FILE`, and `TEST_SRCDIR`.
- **`cached_file_provider.py`**: `CachedFileDataProvider` implements the `TestOptDataProvider` protocol, reading from cache files. Skippable tests return empty unconditionally (hard no-op in manifest mode, matching Go).
- **`writer.py`**: `TestOptWriter` and `TestCoverageWriter` intercept `_send_events` in payload-files mode to write JSON files. Filenames use `{kind}-{timestamp}-{pid}-{seq}.json` pattern matching Go's DDTestRunner expectations. Telemetry files use ordinal-first naming (`telemetry-{seq_padded}-{pid}.json`) for deterministic replay.
- **`telemetry.py`**: `TelemetryAPI` accumulates CI Visibility metrics in payload-files mode and writes them to `payloads/telemetry/` on `finish()`, matching Go's telemetry file-sink behavior.
- **`session_manager.py`**: Swaps `APIClient` for `CachedFileDataProvider` in manifest mode. Uses `NoOpBackendConnectorSetup` for writers/telemetry. Forces test skipping off in manifest mode. Skips git data upload in both offline modes.
- **`env_tags.py`**: In payload-files mode, reads CI/Git tags from `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` instead of invoking git CLI. Falls back to `ci.provider.name = "bazel"` when no other provider is detected.
- **`constants.py`**: New `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` constant.

## Testing

- Unit tests added/updated across 4 test files covering all new functionality:
  - `test_offline_mode.py` — manifest version parsing (plain, assignment syntax, blank lines, invalid), runfiles resolution, `OfflineMode` initialization
  - `test_cached_file_provider.py` — cache reading, skippable tests hard no-op
  - `test_payload_files.py` — payload file naming (`{kind}-{ts}-{pid}-{seq}.json`), telemetry ordinal naming, telemetry file output on `TelemetryAPI.finish()`
  - `test_bazel_offline_session_manager.py` — provider selection, git upload skipping, env data file reading, bazel provider fallback, skipping forced off in manifest mode
- All 90 tests pass on Python 3.12 with pytest ~=8.0.

## Risks

- **Payload file naming**: Changed from `payload_<n>.json` to `{kind}-{ts}-{pid}-{seq}.json` to match Go's DDTestRunner expectations. Any consumer that relied on the old naming would need updating.
- Manifest mode disables test skipping unconditionally — this is intentional and matches Go behavior.

## Additional Notes

- Mirrors Go PR: DataDog/dd-trace-go#4503
- Features not ported (not applicable to Python's architecture): per-span CI/git tag stripping (Python sets tags at metadata level), impacted tests suppression (no such concept in Python yet), low-level git CLI guard (only called from guarded `upload_git_data`), CI log shipping suppression (Python plugin has no log shipping).

Co-authored-by: federico.mon <federico.mon@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/dd-trace-java that referenced this pull request Apr 27, 2026
# What Does This Do

Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring [dd-trace-go#4503](DataDog/dd-trace-go#4503) and [dd-trace-py#17197](DataDog/dd-trace-py#17197):

- **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, flaky tests, and test management data from pre-fetched JSON cache files instead of hitting the backend.
- **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes CI test cycle, coverage, and tracer telemetry to `$TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/*.json` instead of POSTing them.

## Key Changes

- `BazelMode` (internal-api): detects both modes, resolves the manifest path via Bazel's rlocation algorithm, parses the `version=<int>` header, and exposes the `tests/`, `coverage/`, and `telemetry/` output directories.
- `FileBasedConfigurationApi` (agent-ci-visibility): reads the same JSON envelopes as the HTTP API from disk; null paths return safe defaults.
- `FileBasedPayloadDispatcher` (dd-trace-core): serializes CI test cycle and coverage spans as JSON files; strips `ci.*`/`git.*`/`runtime.*`/`os.*` tags to avoid cache invalidation; atomic temp-file + rename. Writes `trace_id`/`span_id`/`parent_id` as unsigned 64-bit JSON numbers (not strings) so backend schema validation passes.
- `FileBasedTelemetryClient` (telemetry): subclass of `TelemetryClient` that writes the existing Moshi-encoded telemetry request body to a file; `TelemetryRouter` gets a single-client path that skips feature discovery; `TelemetrySystem` swaps in the file-based client when Bazel mode is active.
- `WriterFactory` / `CiVisibilityServices` / `CiVisibilityRepoServices`: wire the file-based dispatcher/config API, disable the git client, and skip git-data upload when Bazel mode is active.
- `CoreTracer`: in Bazel payload-files mode, uses `StreamingTraceCollector` (streams each CI Visibility span individually) and `DDIntakeTraceInterceptor` (not the APM-protocol interceptor, which strips `test_{session,module,suite}_end` spans) — same treatment as agentless, so all CITESTCYCLE events reach the file dispatcher.
- `JUnit4TracingListener` / `JUnit4Utils`: lazy-register the test suite in `testStarted` so runners that don't fire `testSuiteStarted` still produce a proper suite span; unwrap `com.google.testing.junit.junit4.runner.RunNotifierWrapper` in `runListenersFromRunNotifier` so the idempotency check sees listeners installed on the inner notifier (fixes duplicate-listener installation under `BazelTestRunner`).
- `Config`: adds `DD_TEST_OPTIMIZATION_MANIFEST_FILE` and `DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`; skips API-key validation in these modes. `TEST_UNDECLARED_OUTPUTS_DIR` is read directly in `BazelMode` (it's a Bazel-provided env var, not a DD config).

# Motivation

Bazel can run tests in hermetic sandboxes with no network access. The existing CI Visibility pipeline requires HTTP calls to fetch configuration and submit payloads, which is incompatible with Bazel's execution model. Most of our operations, such as tagging tests with git metadata, also invalid Bazel's cache. This PR enables CI Visibility under Bazel by reading configuration from pre-fetched cache files and writing payloads/telemetry to files, with the orchestration of everything else being handled by our custom testing rule.

# Additional Notes

- Unit tests cover each new component: `BazelModeTest`, `FileBasedConfigurationApiTest` (shares the existing `*-response.ftl` fixtures with `ConfigurationApiImplTest` to keep the HTTP and file code paths in sync), `FileBasedPayloadDispatcherTest`, `FileBasedTelemetryClientTest`, and extended `TelemetryRouterSpecification`.
- End-to-end repro validated locally against `DataDog/rules_test_optimization_tests`: 3 `test` + 1 `test_suite_end` + 1 `test_module_end` + 1 `test_session_end` events emitted to the payload file, no duplicate listener errors, no schema-validation failures.
- Future work, not included in this PR to avoid changes too big:
	- Include instrumentation improvements to better handle Bazel's custom JUnit4 test runner
	- Refactoring of configuration API related DTOs to common utilities
	- Define a specification for mapping and serializing CI Vis spans to avoid logic mirroring between the two approaches (original vs file-based)
	- Possibly introduce a smoke test for e2e testing of bazel process, to avoid dependencies on an external repository.

# Contributor Checklist

- [x] Format the title according to [the contribution guidelines](https://github.com/DataDog/dd-trace-java/blob/master/CONTRIBUTING.md#title-format)
- [ ] Assign the `type:` and (`comp:` or `inst:`) labels in addition to [any other useful labels](https://github.com/DataDog/dd-trace-java/blob/master/CONTRIBUTING.md#labels)
- [ ] Update the [CODEOWNERS](https://github.com/DataDog/dd-trace-java/blob/master/.github/CODEOWNERS) file on source file addition, migration, or deletion
- [ ] Update [public documentation](https://docs.datadoghq.com/tracing/trace_collection/library_config/java/) with any new configuration flags or behaviors

Jira ticket: [SDTEST-3335]

***Note:*** **Once your PR is ready to merge, add it to the merge queue by commenting \`/merge\`.** \`/merge -c\` cancels the queue request. \`/merge -f --reason "reason"\` skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see [this doc](https://datadoghq.atlassian.net/wiki/spaces/DEVX/pages/3121612126/MergeQueue).

[SDTEST-3335]: https://datadoghq.atlassian.net/browse/SDTEST-3335?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: daniel.mohedano <daniel.mohedano@datadoghq.com>
emmettbutler pushed a commit to DataDog/dd-trace-py that referenced this pull request May 6, 2026
## Description

Adds Bazel-focused CI Visibility support with two offline execution modes, mirroring the Go implementation in DataDog/dd-trace-go#4503:

- **Manifest mode** (`DD_TEST_OPTIMIZATION_MANIFEST_FILE`): reads settings, known tests, and test management data from pre-fetched JSON cache files inside `.testoptimization/`, enabling CI Visibility in Bazel's hermetic sandbox without network access.
- **Payload-files mode** (`DD_TEST_OPTIMIZATION_PAYLOADS_IN_FILES`): writes test event, coverage, and telemetry payloads as JSON files to `TEST_UNDECLARED_OUTPUTS_DIR/payloads/{tests,coverage,telemetry}/` instead of sending HTTP requests.

### Key changes

- **`offline_mode.py`**: `OfflineMode` singleton detects and validates both modes from env vars. Manifest version parsing supports plain `"1"` and `version=1` assignment syntax (matching Go). Runfiles resolution via `RUNFILES_DIR`, `RUNFILES_MANIFEST_FILE`, and `TEST_SRCDIR`.
- **`cached_file_provider.py`**: `CachedFileDataProvider` implements the `TestOptDataProvider` protocol, reading from cache files. Skippable tests return empty unconditionally (hard no-op in manifest mode, matching Go).
- **`writer.py`**: `TestOptWriter` and `TestCoverageWriter` intercept `_send_events` in payload-files mode to write JSON files. Filenames use `{kind}-{timestamp}-{pid}-{seq}.json` pattern matching Go's DDTestRunner expectations. Telemetry files use ordinal-first naming (`telemetry-{seq_padded}-{pid}.json`) for deterministic replay.
- **`telemetry.py`**: `TelemetryAPI` accumulates CI Visibility metrics in payload-files mode and writes them to `payloads/telemetry/` on `finish()`, matching Go's telemetry file-sink behavior.
- **`session_manager.py`**: Swaps `APIClient` for `CachedFileDataProvider` in manifest mode. Uses `NoOpBackendConnectorSetup` for writers/telemetry. Forces test skipping off in manifest mode. Skips git data upload in both offline modes.
- **`env_tags.py`**: In payload-files mode, reads CI/Git tags from `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` instead of invoking git CLI. Falls back to `ci.provider.name = "bazel"` when no other provider is detected.
- **`constants.py`**: New `DD_TEST_OPTIMIZATION_ENV_DATA_FILE` constant.

## Testing

- Unit tests added/updated across 4 test files covering all new functionality:
  - `test_offline_mode.py` — manifest version parsing (plain, assignment syntax, blank lines, invalid), runfiles resolution, `OfflineMode` initialization
  - `test_cached_file_provider.py` — cache reading, skippable tests hard no-op
  - `test_payload_files.py` — payload file naming (`{kind}-{ts}-{pid}-{seq}.json`), telemetry ordinal naming, telemetry file output on `TelemetryAPI.finish()`
  - `test_bazel_offline_session_manager.py` — provider selection, git upload skipping, env data file reading, bazel provider fallback, skipping forced off in manifest mode
- All 90 tests pass on Python 3.12 with pytest ~=8.0.

## Risks

- **Payload file naming**: Changed from `payload_<n>.json` to `{kind}-{ts}-{pid}-{seq}.json` to match Go's DDTestRunner expectations. Any consumer that relied on the old naming would need updating.
- Manifest mode disables test skipping unconditionally — this is intentional and matches Go behavior.

## Additional Notes

- Mirrors Go PR: DataDog/dd-trace-go#4503
- Features not ported (not applicable to Python's architecture): per-span CI/git tag stripping (Python sets tags at metadata level), impacted tests suppression (no such concept in Python yet), low-level git CLI guard (only called from guarded `upload_git_data`), CI log shipping suppression (Python plugin has no log shipping).

Co-authored-by: federico.mon <federico.mon@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants