test(synthetic): add 8 Kubernetes RCA scenarios by hamzzaaamalik · Pull Request #661 · Tracer-Cloud/opensre

hamzzaaamalik · 2026-04-19T07:53:54Z

Summary

Adds 8 Kubernetes failure scenarios to the synthetic test suite from feat: add Kubernetes synthetic RCA test harness #583, covering the most common production K8s issues: out-of-memory crashes, bad image tags, pending pods, broken health probes, quota exhaustion, DNS failures, node failures, and stuck rollouts.
Each scenario is a small folder of fixture files (alert + evidence + expected answer) that the agent investigates end-to-end.
The agent passes all 9 scenarios on a clean run.

Why this matters

Gives the team automated coverage for the K8s diagnosis pipeline any future agent change can be checked against these scenarios before shipping.
Each scenario is designed to be hard: healthy pods sit alongside the broken one, surface symptoms hide the real cause so passing them really tests the agent.

One thing to fix later

Two scenarios (quota and rollout-stuck) needed a small workaround because EKS tool output is currently dropped before reaching the agent (known issue from #583). For now I mirrored the key signals into Datadog logs so the agent can see them. Once the EKS evidence wiring is fixed in a follow-up, the workaround can be removed.

Test plan

Structural validation passes
Full suite scoring run: 9/9 pass
Reviewer can re-run to confirm

greptile-apps · 2026-04-19T07:59:11Z

Greptile Summary

Adds 8 synthetic Kubernetes RCA scenarios (OOMKilled, ImagePullBackOff, pending/unschedulable, liveness-probe killing, resource-quota exceeded, DNS failure, node-not-ready, stuck rollout) to the existing test suite from #583. Each scenario follows the base-override fixture model and includes alert, evidence, and graded answer.yml files.

Prior review concerns (contradictory deployment counters in 002/008, missing available_evidence in 005/006, root-cause tokens in ruling_out_keywords for 004/008) appear to have been addressed. One pattern remains: scenarios 003, 005, and 006 still carry trivially-positive ruling_out_keywords tokens that would appear in any correct answer, giving Axis 2 scoring a free pass without confirming the agent actually reasoned through alternative hypotheses.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 quality improvements to scoring metadata, not runtime or data-correctness issues.

All P0/P1 issues from previous review rounds (contradictory ready/unavailable counts, missing available_evidence, root-cause tokens in ruling_out_keywords for 004 and 008) are resolved. The only remaining finding is a P2 concern about weak ruling_out_keywords in three answer.yml files — a scoring-signal quality issue that doesn't affect test correctness or the agent's pass/fail outcome.

tests/synthetic/eks/003-pending-insufficient-resources/answer.yml, tests/synthetic/eks/005-resource-quota-exceeded/answer.yml, tests/synthetic/eks/006-dns-resolution-failure/answer.yml — ruling_out_keywords should use negative-framing tokens.

Important Files Changed

Filename	Overview
tests/synthetic/eks/003-pending-insufficient-resources/answer.yml	ruling_out_keywords ("nodes", "Ready") are trivially positive-evidence tokens, not negative-framing checks that confirm the agent ruled out node failure or OOM.
tests/synthetic/eks/004-liveness-probe-killing/answer.yml	ruling_out_keywords now correct ("not OOM", "exit code 0") after prior review feedback; scenario and fixtures look consistent.
tests/synthetic/eks/005-resource-quota-exceeded/answer.yml	ruling_out_keywords ("existing pods", "Ready") are positive-evidence tokens that trivially appear in any correct diagnosis; doesn't confirm the agent ruled out scheduler/capacity or other hypotheses.
tests/synthetic/eks/006-dns-resolution-failure/answer.yml	ruling_out_keywords ("Ready", "restart") are trivial tokens for this scenario; better negative-framing tokens would be "not OOM", "not crashloop", or "not probe".
tests/synthetic/eks/008-deployment-rollout-stuck/answer.yml	ruling_out_keywords corrected to "not OOM", "not quota", "old ReplicaSet" per prior review feedback. Looks good.
tests/synthetic/eks/005-resource-quota-exceeded/scenario.yml	available_evidence now explicitly listed including Datadog workaround entries; prior concern about missing field is resolved.
tests/synthetic/eks/006-dns-resolution-failure/scenario.yml	available_evidence explicitly declared with datadog_logs/monitors; EKS-level fixtures fall back to healthy base, correctly expressing the adversarial all-pods-healthy signal.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: tests/synthetic/eks/003-pending-insufficient-resources/answer.yml
Line: 17-19

Comment:
**`ruling_out_keywords` are positive-evidence tokens, not negative-framing**

`nodes` and `Ready` will appear trivially in any correct diagnosis of this scenario (the answer naturally mentions "both nodes are Ready=True but their allocatable CPU is nearly zero"). They carry no signal that the agent specifically ruled out an alternative hypothesis (e.g. node failure, OOM, image pull).

Compare the fixed usage in scenarios 004 and 008: `"not OOM"`, `"exit code 0"`, `"not quota"`. For scenario 003 the intended ruling-out claim is that the nodes themselves are healthy and the block is CPU capacity, so tokens like `"not node failure"` or `"not OOM"` would actually test that conclusion. The same pattern applies to scenario 005 (`existing pods`, `Ready`) and scenario 006 (`Ready`, `restart`).

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (4): Last reviewed commit: "fix(synthetic): make available_evidence ..." | Re-trigger Greptile}

greptile-apps · 2026-04-19T08:19:09Z

Want your agent to iterate on Greptile's feedback? Try greploops.

test(synthetic): add 8 Kubernetes RCA scenarios

e46e9d3

greptile-apps Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread tests/synthetic/eks/002-image-pull-backoff/eks_deployments.json Outdated

Comment thread tests/synthetic/eks/008-deployment-rollout-stuck/answer.yml Outdated

Comment thread tests/synthetic/eks/008-deployment-rollout-stuck/eks_deployments.json Outdated

fix(synthetic): address Greptile review on K8s scenarios

85ecaa4

greptile-apps Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread tests/synthetic/eks/004-liveness-probe-killing/answer.yml Outdated

fix(synthetic): replace 004 ruling_out_keywords with negative evidence

1a9f29b

greptile-apps Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread tests/synthetic/eks/005-resource-quota-exceeded/scenario.yml

hamzzaaamalik mentioned this pull request Apr 19, 2026

[BUG] EKS tool output not reaching the agent needs EVIDENCE_MAPPERS entries #662

Closed

fix(synthetic): make available_evidence explicit on 005 and 006

1d61ecb

davincios merged commit a01f6a1 into Tracer-Cloud:main Apr 19, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(synthetic): add 8 Kubernetes RCA scenarios#661

test(synthetic): add 8 Kubernetes RCA scenarios#661
davincios merged 4 commits into
Tracer-Cloud:mainfrom
hamzzaaamalik:k8s-synthetic-scenarios

hamzzaaamalik commented Apr 19, 2026

Uh oh!

greptile-apps Bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hamzzaaamalik commented Apr 19, 2026

Summary

Why this matters

One thing to fix later

Test plan

Uh oh!

greptile-apps Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 19, 2026 •

edited

Loading