feat: add Kubernetes synthetic RCA test harness by ebrahim-sameh · Pull Request #583 · Tracer-Cloud/opensre

ebrahim-sameh · 2026-04-14T19:54:31Z

Fixes #260

Describe the changes you have made in this PR -

Adds a fully-wired Kubernetes synthetic RCA test harness under tests/synthetic/eks/, mirroring the existing RDS Postgres suite at tests/synthetic/rds_postgres/. Future Kubernetes failure scenarios (issues #261, #262, #263) can now plug in as scenario directories rather than hand-rolled one-off test files, and detect_sources plus the wired EKS and Datadog tools transparently accept injected fixture backends in test mode while leaving real-credential behaviour untouched.

What this PR adds

New files — harness infrastructure (under `tests/synthetic/`)

tests/synthetic/k8s_schemas.py — controlled vocabularies (4 engines, 6 workload types, 11 failure modes, 7 evidence sources, 7 trajectory actions), TypedDicts for the alert envelope and every evidence fixture, a K8sScenarioEvidence dataclass, and validators following the same patterns as tests/synthetic/schemas.py in the RDS suite. Kept deliberately separate from the RDS schemas so the two suites can evolve independently.
tests/synthetic/mock_eks_backend/ — EKSBackend @runtime_checkable Protocol plus FixtureEKSBackend and SelectiveEKSBackend. The fixture backend exposes list_pods, get_events, list_deployments, get_node_health, get_pod_logs, each returning the exact envelope the real tool function in app/tools/EKS*/ produces. The selective subclass records every tool invocation into an audit log for Axis 2 reasoning-quality scoring.
tests/synthetic/mock_datadog_backend/ — DatadogBackend Protocol plus FixtureDatadogBackend and SelectiveDatadogBackend, wrapping datadog_logs.json / datadog_monitors.json fixtures in the envelopes that query_datadog_logs and query_datadog_monitors return in production.
tests/synthetic/eks/scenario_loader.py — mirror of tests/synthetic/rds_postgres/scenario_loader.py: load_all_scenarios, load_scenario, single-level base inheritance with chained-inheritance rejection, file-level evidence fallback from scenario directory to base directory.
tests/synthetic/eks/run_suite.py — CLI runner plus scorer. TrajectoryScore / ReasoningScore / ScenarioScore dataclasses and score_trajectory / score_reasoning / score_result functions parallel the RDS suite. run_scenario builds resolved_integrations with both aws (EKS) and datadog entries containing the injected _backend objects, then delegates to run_investigation.
tests/synthetic/eks/test_suite.py — pytest coverage. Loader validation, schema compliance, mock backend shape assertions, scorer unit tests, a TestScenarioInheritance class mirroring the RDS suite (metadata inheritance, evidence fallback, local override, chained-inheritance rejection, missing-base rejection), and a TestHarnessEndToEnd class that drives the full run_investigation pipeline against the placeholder with a monkey-patched planner (no real LLM call required). The parametrised test_level1..4_scenario tests use skipif guards so they collect zero cases while only the placeholder scenario exists.
tests/synthetic/eks/test_suite_axis2.py — Axis 2 pytest module using the selective backends and score_reasoning for adversarial reasoning-quality checks. Parameter set is empty until scenarios declare ruling_out_keywords or required_queries.
tests/synthetic/eks/000-healthy/ — placeholder scenario directory with scenario_difficulty: 0 so it stays out of the level-1..4 parametrizations. Contains scenario.yml, alert.json, answer.yml, and every declared evidence fixture (eks_pods.json, eks_events.json, eks_deployments.json, eks_node_health.json, datadog_logs.json, datadog_monitors.json). eks_pod_logs is deliberately omitted from available_evidence to exercise the "missing evidence source raises ValueError" path in the mock backend.

Modified files — minimal agent-side wiring

The harness has to feed the fixture backends into the real pipeline the same way the RDS synthetic suite feeds FixtureGrafanaBackend into the Grafana tools. That pattern was not yet present for EKS or Datadog, so:

app/nodes/plan_actions/detect_sources.py — extend the EKS and Datadog paths to accept a pre-injected _backend key, matching how the existing Grafana path already works. The EKS integration continues to live under resolved_integrations[\"aws\"] as before; detect_sources now reads _backend from that dict and propagates it to sources[\"eks\"][\"_backend\"]. Same treatment for Datadog. Critically, backend-only mode deliberately does NOT set connection_verified: True — this keeps the 6 unwired EKS tools and the 4 unwired Datadog tools inactive in test mode, so a scenario cannot accidentally trigger a real AWS or Datadog call.
app/tools/EKSListClustersTool/__init__.py — introduce a new _eks_available_or_backend(sources) helper that returns True when either connection_verified or _backend is present. Only the 5 wired tools import this helper; the other 6 EKS tools keep using the existing _eks_available check. Also relaxes _eks_creds from eks[\"role_arn\"] to eks.get(\"role_arn\", \"\") so it no longer KeyErrors when called from the backend-only path.
5 EKS tools (EKSListPodsTool, EKSEventsTool, EKSListDeploymentsTool, EKSNodeHealthTool, EKSPodLogsTool) — each gets an eks_backend: Any = None kwarg added to its function signature (plus an eks_backend = eks.get(\"_backend\") line in its extract_params helper). The function body short-circuits to eks_backend.<method>(...) when the kwarg is set, before any call to build_k8s_clients. The result is cast to dict[str, Any] to satisfy mypy's warn_return_any. role_arn was also loosened from positional-required to default-empty to support the backend-only call path.
app/tools/DataDogLogsTool/__init__.py and app/tools/DataDogMonitorsTool/__init__.py — add a _dd_available_or_backend helper in DataDogLogsTool (imported by DataDogMonitorsTool), plus a datadog_backend: Any = None kwarg on each tool function and a short-circuit to datadog_backend.query_logs(...) / datadog_backend.query_monitors(...) before the real make_client path. Same cast pattern as the EKS tools.

Makefile — add a test-k8s-synthetic target mirroring test-rds-synthetic:

test-k8s-synthetic:
	\$(PYTHON) -m tests.synthetic.eks.run_suite \$(if \$(SCENARIO),--scenario \$(SCENARIO),)

Out of scope for this PR

Real Kubernetes failure scenarios (K8s scenarios: CrashLoopBackOff, OOMKilled, ImagePullBackOff #261, K8s scenarios: Node NotReady, Pending Pods, Stuck Rollouts #262, K8s scenarios: Eviction, DNS failures, Probe failures, Quota limits #263). The issue description is explicit: "This issue covers the test harness itself, not the individual scenarios." This PR ships the harness plus a single 000-healthy placeholder whose only job is to prove the harness wiring works end-to-end. Difficulty-level parametrizations stay empty until those issues add real scenarios.
Wiring the 6 remaining EKS tools and 4 remaining Datadog tools. Only the tools whose output corresponds to a declared evidence source in the issue scope are wired; this keeps the blast radius minimal. When future scenarios need additional sources, the same short 3-line pattern copied from the wired tools applies.

Pre-existing gaps flagged separately (not fixed in this PR)

While building the harness I identified two pre-existing gaps in the existing EKS plumbing. Both are NOT part of the harness scope, but #261 / #262 / #263 will need them to work end-to-end. Both have been filed as standalone bug reports before this PR was opened so they can be picked up in parallel by any contributor:

[BUG] EKS tool output silently dropped by merge_evidence — no mappers in post_process.py #581 — `[BUG] EKS tool output silently dropped by merge_evidence — no mappers in post_process.py` — EVIDENCE_MAPPERS in app/nodes/investigate/processing/post_process.py has mappers for Grafana, Datadog, CloudWatch, S3, Lambda, GitHub, Honeycomb, Coralogix and Vercel, but no entries for any list_eks_* / get_eks_* / describe_eks_* action name. Tool output is silently discarded by merge_evidence(). Small additive fix (~50 lines in a single file). Complete proposed patch in the issue body.
[BUG] is_clearly_healthy short-circuit never fires for pure-EKS healthy states — eks_* keys missing from _INVESTIGATED_EVIDENCE_KEYS #582 — `[BUG] is_clearly_healthy short-circuit never fires for pure-EKS healthy states` — _INVESTIGATED_EVIDENCE_KEYS in app/nodes/root_cause_diagnosis/evidence_checker.py has no eks_* entries, so a pure-Kubernetes healthy state never short-circuits out of the reasoning LLM. Five-line fix. Ordering-depends on [BUG] EKS tool output silently dropped by merge_evidence — no mappers in post_process.py #581 landing first. Complete proposed patch in the issue body.

Because these are pre-existing and independent, this PR's end-to-end smoke test (TestHarnessEndToEnd::test_placeholder_runs_through_full_pipeline) only asserts on datadog_* evidence keys — the existing Datadog mappers are enough to trigger the healthy short-circuit for the 000-healthy placeholder. A scope-note in that test's class docstring points forward to the follow-up issues so the EKS assertions can be enabled once #581 lands.

Testing

All three gate commands pass locally on Python 3.12 and are the same commands CI runs:

make lint         # ruff check app/ tests/  → All checks passed!
make typecheck    # mypy app/               → Success: no issues found in 340 source files
make test-cov     # pytest -n auto --ignore tests/synthetic -m \"not synthetic\"  → 2137 passed, 1 skipped

Targeted verification of the new suite and every touched tool file:

pytest tests/synthetic/eks/                                           → 21 passed, 5 skipped
pytest tests/tools/test_eks_list_pods_tool.py                         → passed (contract + happy path + error path)
pytest tests/tools/test_eks_events_tool.py                            → passed
pytest tests/tools/test_eks_list_deployments_tool.py                  → passed
pytest tests/tools/test_eks_node_health_tool.py                       → passed
pytest tests/tools/test_eks_pod_logs_tool.py                          → passed
pytest tests/tools/test_datadog_logs_tool.py                          → passed
pytest tests/tools/test_datadog_monitors_tool.py                      → passed
pytest tests/nodes/plan_actions/                                      → 32 passed

The end-to-end smoke test drives the full LangGraph pipeline against 000-healthy with a canned planner plan, asserting that detect_sources picks up the injected backends, the executor routes each of list_eks_pods / get_eks_events / list_eks_deployments / get_eks_node_health / query_datadog_logs / query_datadog_monitors to its fixture mock, Datadog evidence flows into state, and diagnose_root_cause returns root_cause_category: healthy via the healthy short-circuit without any real LLM call. No Anthropic or OpenAI API key is required — monkeypatch.setenv(\"ANTHROPIC_API_KEY\", \"sk-test-dummy\") is enough to satisfy LLMSettings because every real LLM call in the run is either mocked (plan_actions) or bypassed (diagnose_root_cause healthy short-circuit).

Screenshots of the UI changes (If any) -

N/A — no user-facing UI changes. This PR touches only test infrastructure plus the backend-injection seams in a handful of tool files and detect_sources. Production behaviour against real EKS / Datadog credentials is identical; the new code paths are only reachable when a `_backend` object is present in the integration dict, which is only produced by the synthetic harness.

Impact analysis

Backward compatibility: fully preserved. Every app-side change is either a new additive helper (_eks_available_or_backend, _dd_available_or_backend) or an additive kwarg with a default (eks_backend=None, datadog_backend=None) that only matters when explicitly set. Real-credential investigations take the unchanged code path.
Coverage of existing EKS / Datadog tool tests: unchanged — the 81 existing tests across the 7 touched tool files all still pass.
Performance: no measurable impact on real investigations. Each tool gains a single if eks_backend is not None: / if datadog_backend is not None: check at the top of its function, taken once per invocation.
Secrets: no new environment variables, no .env writes, no credentials stored anywhere.
Dependencies: no new runtime or dev dependencies added.

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Problem solved: OpenSRE already has an RDS Postgres synthetic suite under tests/synthetic/rds_postgres/ that gives the team a reproducible, offline way to benchmark the agent's root-cause reasoning against fixture data — no real cloud calls, no flaky APIs, clear pass/fail per scenario. There was no equivalent for Kubernetes, so any improvement to the K8s investigation path could not be measured the same way. This PR adds the parallel infrastructure so follow-up issues can drop in K8s failure scenarios as scenario directories.

Alternatives considered:

Mock at the Kubernetes Python SDK layer instead of the tool-function layer. Rejected: mocking build_k8s_clients would force every test to reason about raw K8s SDK objects rather than the higher-level tool response shapes the pipeline actually consumes. The issue text is explicit that "Response shapes must match what the EKS tools in `app/tools/EKS*/` and Datadog tools return", which points at the tool-function layer as the right seam.
Mock at the run_investigation entry point using unittest.mock.patch. Rejected: it only exercises the glue and gives zero confidence that real scenarios will actually drive detect_sources → plan_actions → executor → tools correctly, which is the whole point of a synthetic suite.
Wire backends into every EKS and Datadog tool upfront, even the ones the initial placeholder does not declare. Rejected to minimise blast radius. Only the 5 EKS tools and 2 Datadog tools whose output corresponds to a declared evidence source in the issue scope are wired. The unwired tools continue to use connection_verified alone as their availability gate, so they stay completely inactive in test mode and cannot accidentally hit real AWS or Datadog. When future scenarios need additional sources, the same short 3-line pattern is trivial to copy.

Why this implementation:

Mirrors RDS exactly. The loader, scorer, runner, pytest suite, and fixture layout are structural copies of the RDS suite with K8s-specific substitutions. This makes the review straightforward — reviewers already know the patterns — and keeps both suites easy to maintain side by side.
Uses the same _backend injection seam that the Grafana tools already use. No new abstractions are introduced; the pattern is one the team has already accepted during the RDS harness work, which minimises cognitive load for reviewers.
Placeholder is intentionally scenario_difficulty: 0. The test_level1..4_scenario parametrizations collect zero cases until real failure scenarios land in follow-up issues. The placeholder's only job is to exercise the harness plumbing once, not to grade an LLM on a synthetic case the LLM was not trained or evaluated on.
Pre-existing gaps were kept strictly out of scope. The two bug reports ([BUG] EKS tool output silently dropped by merge_evidence — no mappers in post_process.py #581, [BUG] is_clearly_healthy short-circuit never fires for pure-EKS healthy states — eks_* keys missing from _INVESTIGATED_EVIDENCE_KEYS #582) were filed before opening this PR so the harness work stays cleanly scoped, and the pre-existing gaps can be reviewed and fixed on their own merits by any contributor.

Key components and their jobs:

K8sScenarioEvidence + validators (k8s_schemas.py) — structural gate between raw JSON fixture files and the typed container the loader returns. Raises ValueError with file-qualified context on any malformed fixture.
K8sScenarioFixture + load_scenario / load_all_scenarios (scenario_loader.py) — discover, validate, and return a typed snapshot of a scenario directory. Handles single-level base inheritance and file-level evidence fallback.
FixtureEKSBackend / FixtureDatadogBackend — satisfy runtime-checkable Protocols, wrap scenario fixtures in the exact envelopes the real tool functions produce, and raise ValueError when the caller requests a source the scenario did not declare.
SelectiveEKSBackend / SelectiveDatadogBackend — subclass their non-selective counterparts and record each tool invocation into an audit log for Axis 2 reasoning-quality checks.
run_scenario in run_suite.py — builds the resolved_integrations dict with either fresh fixture backends or pre-built selective backends, delegates to run_investigation, collects any audit log produced by selective backends, and calls score_result.
score_trajectory / score_reasoning / score_result — Axis 1 correctness (category, keywords, forbidden terms, required evidence sources, trajectory efficiency) and Axis 2 adversarial reasoning (ruling-out keywords plus required-query audit). Direct parallel of the RDS scorer with _EVIDENCE_KEY_MAP adjusted for K8s.
detect_sources.py EKS and Datadog path changes — accept _backend in the incoming resolved_integrations dict, propagate it to the relevant sources[...] entry, and deliberately skip setting connection_verified in backend-only mode so only fixture-aware tools activate.
5 EKS tools plus 2 Datadog tools — each gains a *_backend: Any = None kwarg that short-circuits to the mock when set. Real-credential behaviour is completely untouched.

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Mirror the existing RDS Postgres synthetic suite under tests/synthetic/ for Kubernetes workloads so future K8s scenarios (crash-loop, OOM-killed, image-pull-backoff, etc.) can plug in as scenario directories rather than one-off test files. Harness components: * tests/synthetic/k8s_schemas.py: TypedDicts, controlled vocabularies and validators for K8s scenario metadata, alert envelopes, and every declared evidence source (eks_pods, eks_events, eks_deployments, eks_node_health, eks_pod_logs, datadog_logs, datadog_monitors). * tests/synthetic/mock_eks_backend/ and mock_datadog_backend/: fixture backends that satisfy runtime-checkable Protocols and return the exact envelope shape the real tool functions under app/tools/EKS*/ and app/tools/DataDog*/ produce. Each package ships a straight fixture backend for Axis 1 and a selective variant that records every tool invocation for Axis 2 reasoning-quality scoring. * tests/synthetic/eks/scenario_loader.py: scenario discovery, base scenario inheritance (single-level, chained bases rejected), evidence file fallback from scenario dir to base dir, typed ScenarioFixture. * tests/synthetic/eks/run_suite.py: CLI runner plus score_trajectory / score_reasoning / score_result functions mirroring the RDS harness, with an _EVIDENCE_KEY_MAP hook for future refinement. * tests/synthetic/eks/test_suite.py: unit tests for the loader, backend shapes, scorer, and scenario inheritance, plus an end-to-end smoke test that drives run_investigation() with a canned plan (no real LLM call required). * tests/synthetic/eks/test_suite_axis2.py: pytest parametrisation for adversarial reasoning tests using the selective backends. * tests/synthetic/eks/000-healthy/: placeholder scenario directory with scenario_difficulty: 0 so the parametrised level1..4 collections stay empty until the first real failure scenarios land under Tracer-Cloud#261 / Tracer-Cloud#262 / Tracer-Cloud#263. Its only job is to exercise the full harness wiring end-to-end. * Makefile: adds test-k8s-synthetic target, mirroring test-rds-synthetic. Agent-side wiring for the mock backends: * app/nodes/plan_actions/detect_sources.py: extend the EKS and Datadog paths to accept a pre-injected _backend dict key the same way Grafana already does. Backend-only mode deliberately does not set connection_verified, so only the five fixture-supported EKS tools activate (list_eks_pods, get_eks_events, list_eks_deployments, get_eks_node_health, get_eks_pod_logs) and the six unsupported ones stay quiet. * app/tools/EKSListClustersTool/__init__.py: introduce _eks_available_or_backend helper for the wired tools; make _eks_creds safe against missing role_arn (backend-only call path). * app/tools/EKS*Tool/__init__.py (5 wired tools) and app/tools/DataDog{Logs,Monitors}Tool/__init__.py: add an eks_backend / datadog_backend kwarg to each tool function and short-circuit to the mock when present, matching the pattern the Grafana tools already use. Known follow-ups flagged as separate issues before this PR was opened: * Issue for EKS evidence mappers in app/nodes/investigate/processing/post_process.py: the EKS tools' output is currently dropped by merge_evidence because the EVIDENCE_MAPPERS registry has no entries for them. Until that lands, the end-to-end smoke test in this PR only asserts on datadog_* evidence keys. * Issue for is_clearly_healthy _INVESTIGATED_EVIDENCE_KEYS: the frozenset used by the healthy short-circuit has no eks_* entries, so pure-EKS healthy scenarios will not fast-path out of the reasoning LLM call. Also independent of this PR; flagged for a follow-up.

greptile-apps · 2026-04-14T20:01:29Z

Greptile Summary

This PR adds a Kubernetes synthetic RCA test harness under tests/synthetic/eks/, mirroring the existing RDS Postgres suite. It introduces typed schemas, fixture-backed mock EKS and Datadog backends, a scenario loader with single-level inheritance, a scorer (Axis 1 + Axis 2), and a 000-healthy placeholder scenario. App-side wiring adds backend-injection seams to detect_sources, five EKS tools, and two Datadog tools via additive kwargs with no impact on real-credential paths.

All three concerns from the previous review round have been addressed: the _eks_available_or_backend / _dd_available_or_backend helpers were extracted to app/tools/utils/availability.py, the detect_sources EKS path now falls back to cluster_names[0] when a backend is injected but no annotation provides cluster_name, and the sequencing_ok field is now documented as an intentional set-membership check.

Confidence Score: 5/5

Safe to merge — all three prior review concerns are fully resolved; remaining findings are P2 cleanup items that don't affect correctness.

All production-path changes are strictly additive (new kwarg defaults, new if-backend-not-None short-circuits, shared availability helpers). The harness is well-tested with 21 passing unit/integration tests, no new runtime dependencies, and backward compatibility is preserved. Remaining findings are dead code and a duplicated constant in test infrastructure.

tests/synthetic/eks/run_suite.py (unused ResolvedBackends), tests/synthetic/mock_datadog_backend/backend.py (duplicated _ERROR_KEYWORDS), Makefile (missing .PHONY entry).

Important Files Changed

Filename	Overview
tests/synthetic/eks/run_suite.py	Core runner and scorer; contains an unused `ResolvedBackends` dataclass (dead code) and identity `_EVIDENCE_KEY_MAP` noted as temporary.
tests/synthetic/mock_datadog_backend/backend.py	Fixture Datadog backend; duplicates `_ERROR_KEYWORDS` from production code, risking drift if the production list is updated.
app/nodes/plan_actions/detect_sources.py	Backend injection seam for EKS and Datadog added cleanly; includes cluster_name fallback and intentional omission of connection_verified in backend-only mode.
app/tools/utils/availability.py	New shared module for eks_available_or_backend and datadog_available_or_backend; correctly addresses the prior cross-tool coupling concern.
Makefile	test-k8s-synthetic target added but missing from the .PHONY declaration, unlike the parallel test-rds-synthetic which is declared.
tests/synthetic/eks/test_suite.py	21 tests covering loader, mock backend shapes, scorer, inheritance, and end-to-end pipeline smoke test; no LLM credentials required.
tests/synthetic/k8s_schemas.py	Controlled vocabularies, TypedDicts, and validators for K8s fixtures; thorough field-level validation with descriptive error messages.
tests/synthetic/eks/scenario_loader.py	Mirrors RDS loader with chained-inheritance rejection and file-level evidence fallback; clean and well-tested.
app/tools/EKSListPodsTool/init.py	Backend short-circuit added correctly with cast for mypy; role_arn defaulted to empty string to support backend-only path.

Sequence Diagram

sequenceDiagram
    participant T as test_suite.py
    participant RS as run_suite.py
    participant RI as run_investigation
    participant DS as detect_sources
    participant FEB as FixtureEKSBackend
    participant FDB as FixtureDatadogBackend
    participant ET as EKS Tools (x5)
    participant DT as Datadog Tools (x2)

    T->>RS: run_scenario(fixture, use_mock_backends=True)
    RS->>RS: _build_resolved_integrations()
    RS->>RI: run_investigation(resolved_integrations)
    RI->>DS: detect_sources(resolved_integrations)
    DS->>DS: sources[eks][_backend]=FEB
    DS->>DS: sources[datadog][_backend]=FDB
    DS-->>RI: sources with _backend keys
    RI->>ET: list_eks_pods(eks_backend=FEB)
    ET->>FEB: list_pods(cluster_name, namespace)
    FEB-->>ET: fixture envelope
    ET-->>RI: dict[str, Any]
    RI->>DT: query_datadog_logs(datadog_backend=FDB)
    DT->>FDB: query_logs(query)
    FDB-->>DT: fixture envelope
    DT-->>RI: dict[str, Any]
    RI-->>RS: final_state
    RS->>RS: score_result(fixture, final_state)
    RS-->>T: (state_dict, ScenarioScore)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: Makefile
Line: 108-109

Comment:
**`test-k8s-synthetic` missing from `.PHONY`**

`test-rds-synthetic` is declared in the `.PHONY` list (line 4) but `test-k8s-synthetic` is not. If a file or directory named `test-k8s-synthetic` were ever created, `make test-k8s-synthetic` would silently no-op instead of running the suite. Add `test-k8s-synthetic` to the `.PHONY` line alongside `test-rds-synthetic`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/synthetic/eks/run_suite.py
Line: 77-82

Comment:
**`ResolvedBackends` dataclass is dead code**

`ResolvedBackends` is defined here but never referenced anywhere in `run_suite.py`, `test_suite.py`, `test_suite_axis2.py`, or the rest of the codebase. The backends are passed directly as kwargs to `run_scenario` and collected via `_collect_queried_tools`, so this dataclass serves no purpose. Removing it avoids confusion for future contributors who might try to use it.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/synthetic/mock_datadog_backend/backend.py
Line: 26-38

Comment:
**Duplicated `_ERROR_KEYWORDS` risks drift from production**

This tuple is an exact copy of the one in `app/tools/DataDogLogsTool/__init__.py`. If a keyword is added or removed in production (e.g. adding `"segfault"`), the mock backend's `error_logs` filtering will silently diverge — synthetic scenarios that rely on `error_logs` counts would then test against a stale classification. Consider importing directly:
```python
from app.tools.DataDogLogsTool import _ERROR_KEYWORDS
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "fix: address review findings on k8s synt..." | Re-trigger Greptile}

CodeQL py/ineffectual-statement (7 alerts) and py/unused-global-variable (1 alert): * Remove the trailing `...` Ellipsis literal from every Protocol method body in tests/synthetic/mock_eks_backend/backend.py (5 methods) and tests/synthetic/mock_datadog_backend/backend.py (2 methods). In Python a docstring is itself a valid function body, so the `...` was a dead expression that CodeQL flagged as ineffectual. * Remove the unused _BASE_SCENARIO_YML module-level constant from tests/synthetic/eks/test_suite.py. It was left over from an earlier refactor where the template was inlined per test method. Greptile P2 follow-ups: * detect_sources.py: when a backend is pre-injected under resolved_integrations["aws"]["_backend"] but the alert's annotations do not carry a cluster_name / eks_cluster key, fall back to the first entry in cluster_names on the integration dict. Without this fallback, future synthetic scenarios that forget to put the cluster name in commonAnnotations silently produce zero EKS tool activity with no diagnostic. * Lift the backend-aware availability helpers out of tool-specific modules and into a new shared app/tools/utils/availability.py. The previous layout had _eks_available_or_backend defined in EKSListClustersTool/__init__.py and imported by 5 other EKS tools, and _dd_available_or_backend defined in DataDogLogsTool/__init__.py and imported by DataDogMonitorsTool. The new file exposes eks_available_or_backend and datadog_available_or_backend at the module level, and every wired tool imports directly from the utils module. No behaviour change — pure relocation of the two helpers. * Clarify the set-membership semantics of TrajectoryScore.sequencing_ok in tests/synthetic/eks/run_suite.py. The field stays named sequencing_ok for parallelism with the RDS synthetic suite (which also uses the set-membership check), but the comment now makes it explicit that ordering is intentionally not enforced because actions run in parallel and completion order is non-deterministic. All three gate commands still pass locally: make lint → All checks passed! make typecheck → Success: no issues found in 341 source files make test-cov → 2137 passed, 1 skipped (pre-existing pyenv-shim failures in tests/cli_smoke_test.py are unchanged and pass in CI) Targeted tests also pass: pytest tests/synthetic/eks/ → 21 passed, 5 skipped pytest tests/tools/test_eks_* → 51 passed pytest tests/tools/test_datadog_* → 30 passed Relates to Tracer-Cloud#260.

github-advanced-security AI found potential problems Apr 14, 2026

View reviewed changes

greptile-apps Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/synthetic/eks/run_suite.py

Comment thread app/nodes/plan_actions/detect_sources.py

Comment thread app/tools/DataDogLogsTool/__init__.py Outdated

This was referenced Apr 14, 2026

fix: plumb EKS tool output into state evidence via post_process mappers #584

Merged

fix: recognise EKS evidence keys in is_clearly_healthy short-circuit #585

Merged

Build the K8s synthetic test harness #260

Closed

davincios merged commit 7d6d624 into Tracer-Cloud:main Apr 18, 2026
8 checks passed

This was referenced Apr 19, 2026

test(synthetic): add 8 Kubernetes RCA scenarios #661

Merged

[BUG] EKS tool output not reaching the agent needs EVIDENCE_MAPPERS entries #662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Kubernetes synthetic RCA test harness#583

feat: add Kubernetes synthetic RCA test harness#583
davincios merged 2 commits into
Tracer-Cloud:mainfrom
ebrahim-sameh:issue/260-k8s-synthetic-test-harness

ebrahim-sameh commented Apr 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 14, 2026 •

edited

Loading

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ebrahim-sameh commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the changes you have made in this PR -

What this PR adds

New files — harness infrastructure (under tests/synthetic/)

Modified files — minimal agent-side wiring

Out of scope for this PR

Pre-existing gaps flagged separately (not fixed in this PR)

Testing

Screenshots of the UI changes (If any) -

Impact analysis

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ebrahim-sameh commented Apr 14, 2026 •

edited

Loading

New files — harness infrastructure (under `tests/synthetic/`)

greptile-apps Bot commented Apr 14, 2026 •

edited

Loading