Skip to content

fix: plumb EKS tool output into state evidence via post_process mappers#584

Merged
Devesh36 merged 1 commit into
Tracer-Cloud:mainfrom
ebrahim-sameh:issue/581-eks-evidence-mappers
Apr 16, 2026
Merged

fix: plumb EKS tool output into state evidence via post_process mappers#584
Devesh36 merged 1 commit into
Tracer-Cloud:mainfrom
ebrahim-sameh:issue/581-eks-evidence-mappers

Conversation

@ebrahim-sameh

@ebrahim-sameh ebrahim-sameh commented Apr 14, 2026

Copy link
Copy Markdown
Collaborator

Fixes #581

Describe the changes you have made in this PR -

The EKS investigation tools under app/tools/EKS*/ already return their results in a consistent envelope shape, but merge_evidence() in app/nodes/investigate/processing/post_process.py has no EVIDENCE_MAPPERS entries for any of the list_eks_* / get_eks_* / describe_eks_* action names. The mapper dict contains entries for Grafana, Datadog, CloudWatch, S3, Lambda, GitHub, Honeycomb, Coralogix and Vercel — EKS was missed.

As a result, successfully-executed EKS tools silently drop their data on the floor: merge_evidence loops over execution_results, looks up the action name in EVIDENCE_MAPPERS, gets None back, and never stores the result in state["evidence"]. Downstream, diagnose_root_cause builds its prompt from a state dict that has zero eks_* keys regardless of what the tools returned. The agent effectively investigates Kubernetes incidents without access to the data it just gathered — the opposite of what the EKS toolset was added for.

build_evidence_summary() had the same gap: no elif branches for EKS, so the tracker message emitted at the end of node_investigate never mentioned any EKS activity either.

What this PR changes

app/nodes/investigate/processing/post_process.py — five new mapper functions, five new registry entries, and five new summary branches. No other files under app/ are touched.

  1. New mapper functions (placed next to the existing _map_datadog_* mappers, same shape):

    • _map_eks_pods(data){eks_pods, eks_failing_pods, eks_high_restart_pods, eks_total_pods}
    • _map_eks_events(data){eks_events, eks_total_warning_count}
    • _map_eks_deployments(data){eks_deployments, eks_degraded_deployments, eks_total_deployments}
    • _map_eks_node_health(data){eks_node_health, eks_not_ready_count, eks_total_nodes}
    • _map_eks_pod_logs(data){eks_pod_logs, eks_pod_logs_pod_name, eks_pod_logs_namespace}

    Each one reads the fields the corresponding tool function populates (verified by reading each tool's return statement in app/tools/EKS*Tool/__init__.py) and writes them to dedicated state keys with .get(..., <sensible default>) so a missing or partial tool result does not raise.

  2. Registered in EVIDENCE_MAPPERS alongside the existing entries:

    "list_eks_pods": _map_eks_pods,
    "get_eks_events": _map_eks_events,
    "list_eks_deployments": _map_eks_deployments,
    "get_eks_node_health": _map_eks_node_health,
    "get_eks_pod_logs": _map_eks_pod_logs,
  3. New elif branches in build_evidence_summary() so the tracker output mirrors the existing Grafana / Datadog style:

    • eks:3 pods (1 failing)list_eks_pods
    • eks:2 warning eventsget_eks_events
    • eks:2 deployments (1 degraded)list_eks_deployments
    • eks:3 nodes (1 not ready)get_eks_node_health
    • eks:42 log lines from payments-api-7f9get_eks_pod_logs

Scope boundary

Wiring the remaining six EKS tools (list_eks_clusters, list_eks_namespaces, describe_eks_cluster, describe_eks_addon, get_eks_deployment_status, get_eks_nodegroup_health) is deliberately out of scope for this PR. Those tools surface supplementary data (cluster inventory, addon versions, nodegroup health metadata) that no current investigation or synthetic scenario consumes. When a future scenario or code path needs one, the same 5-line mapper pattern applies and can be added in a follow-up PR.

Testing

All three gate commands pass locally on Python 3.12:

make lint         # ruff check app/ tests/  → All checks passed!
make typecheck    # mypy app/               → Success: no issues found in 340 source files
make test-cov     # pytest -n auto --ignore tests/synthetic -m "not synthetic"  → 2156 passed, 1 skipped

The baseline before this PR was 2137 passed. The delta of 19 matches the 19 new tests added in tests/nodes/investigate/test_post_process.py:

pytest tests/nodes/investigate/test_post_process.py -v  → 19 passed

The new test module covers:

  • TestEKSMappersRegistered — asserts each of the 5 EKS action names is present in EVIDENCE_MAPPERS (one test per action, so adding a sixth mapper later catches removal regressions cleanly).
  • TestListEKSPodsMapper — feeds a representative list_eks_pods result (3 pods, 1 failing, 0 high-restart) and asserts all four eks_* keys appear with the expected values; a second test covers the "missing fields" defaults path.
  • TestGetEKSEventsMapper — a Warning-events-present case (OOMKilled) and a healthy no-events case.
  • TestListEKSDeploymentsMapper — mixed case with one healthy and one degraded deployment, asserts the degraded_deployments subset is the right one.
  • TestGetEKSNodeHealthMapper — two nodes, one not ready, asserts eks_not_ready_count == 1 and eks_total_nodes == 2.
  • TestGetEKSPodLogsMapper — three-line log fixture with a fatal marker, asserts the marker round-trips to eks_pod_logs.
  • TestMergeEvidenceSkipsFailedResults — passes a failed ActionExecutionResult for list_eks_pods and asserts no eks_* keys appear in the returned evidence dict (matches the existing if not result.success: continue behaviour at post_process.py:349).
  • TestBuildEvidenceSummaryEKS — six cases, one per tool summary branch, asserting the exact substring the tracker will emit.

Screenshots of the UI changes (If any) -

N/A — no user-facing UI changes. This PR touches the evidence post-processing layer only. Production behaviour for Grafana, Datadog, CloudWatch and every other existing evidence source is identical: the change is additive.

Impact analysis

  • Backward compatibility: fully preserved. Every change is additive — five new mapper functions, five new dict entries, five new elif branches. No existing keys, functions or code paths are modified. Real-credential EKS investigations that were previously silently discarding tool output will now populate state["evidence"]["eks_*"] as they should always have done. If any downstream consumer was coincidentally relying on the absence of eks_* keys (unlikely but possible), they would see the new keys appear — hence flagged in this section rather than assumed to be a non-issue.
  • Performance: negligible. Each mapper is a dict comprehension over at most a handful of fields; called at most once per EKS tool execution.
  • Secrets: no new environment variables, no .env writes, no credentials handled.
  • Dependencies: no new runtime or dev dependencies added.

Related issues


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Problem solved: EKS investigation tools were silently dropping their output at the merge_evidence step because the EVIDENCE_MAPPERS registry in post_process.py had no entries for any EKS action name. This was discovered while scoping the Kubernetes synthetic test harness work (#260): I noticed the harness would never see EKS evidence in state["evidence"] no matter what the tools returned, traced it back to the mapper gap, and filed this issue so the fix could be reviewed as a standalone scoped change rather than bundled into the harness PR.

Alternatives considered:

  1. Add a fallback in merge_evidence that stores raw tool results under <action_name>_raw keys when no mapper exists. Rejected: it hides the real bug (missing mapper) and creates a different problem (diagnose_root_cause does not look at arbitrary *_raw keys), so the agent still would not see the data.

  2. Write a single generic EKS mapper that stores the entire result dict under one key per tool. Rejected: breaks the pattern every other evidence family follows (one dict key per conceptual evidence slice, e.g. datadog_logs and datadog_error_logs are separate keys), and makes downstream consumers in diagnose_root_cause / build_diagnosis_prompt harder to write.

  3. Fix all 11 EKS tools' mappers in one pass, including the six tools whose output no current scenario consumes. Rejected to keep the PR focused. The unused six can land in a follow-up whenever a scenario needs them. Shipping the five that matter now unblocks Build the K8s synthetic test harness #260 / K8s scenarios: CrashLoopBackOff, OOMKilled, ImagePullBackOff #261 / K8s scenarios: Node NotReady, Pending Pods, Stuck Rollouts #262 / K8s scenarios: Eviction, DNS failures, Probe failures, Quota limits #263 immediately.

Why this implementation:

  • Follows the exact pattern of the existing Datadog mappers. _map_datadog_logs and _map_datadog_monitors are the closest analogues because Datadog tools return similar envelope shapes (a top-level dict with source, available, typed fields, error). Each new mapper mirrors that style so the whole file remains visually consistent and reviewers already know what to look for.

  • Uses .get() with sensible defaults rather than data["key"] so a partial tool result (e.g. an EKS call that succeeded structurally but returned zero pods) populates the evidence dict cleanly with empty collections instead of raising KeyError. The existing Datadog mappers do the same thing (data.get("logs", []), data.get("total", 0)).

  • build_evidence_summary branches use data.get(X) is not None rather than truthiness for list fields that can legitimately be empty on healthy scenarios (warning_events = [], failing_pods = []). This avoids the tracker silently swallowing healthy-state summaries. The counting cases ("eks:3 pods (0 failing)") are more informative than an empty summary line.

Key components and their jobs:

  • _map_eks_pods(data) — projects a list_eks_pods tool result into four state keys. eks_total_pods is the raw count, eks_pods is the full list, eks_failing_pods is the subset whose phase is not Running or Succeeded, eks_high_restart_pods is the subset whose containers have more than 3 restarts. These subset splits are computed by the tool itself in app/tools/EKSListPodsTool/__init__.py:81-83, so the mapper just passes them through.

  • _map_eks_events(data) — projects get_eks_events output. eks_events is the Warning-event list, eks_total_warning_count is the raw count. The tool filters at source so only type == "Warning" events appear.

  • _map_eks_deployments(data) — projects list_eks_deployments. eks_deployments is all deployments, eks_degraded_deployments is the subset where unavailable > 0 or ready < desired (computed by the tool at app/tools/EKSListDeploymentsTool/__init__.py:70-72).

  • _map_eks_node_health(data) — projects get_eks_node_health. eks_node_health is the full node list (with flattened string condition fields, per the tool's output shape), eks_not_ready_count is the integer count of nodes whose Ready condition is not "True" (computed by the tool at app/tools/EKSNodeHealthTool/__init__.py:72).

  • _map_eks_pod_logs(data) — projects get_eks_pod_logs. Flat string log output plus the pod name and namespace so downstream consumers can cite the source.

  • The EVIDENCE_MAPPERS registry additions are inserted alphabetically-adjacent to the existing Datadog block, matching the file's implicit grouping-by-evidence-family ordering.

  • The new build_evidence_summary branches are placed in the same Datadog-adjacent block so they show up in the same relative position when the tracker emits them.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

The EKS investigation tools under app/tools/EKS*/ already return their
results in a consistent envelope shape, but merge_evidence() in
app/nodes/investigate/processing/post_process.py has no EVIDENCE_MAPPERS
entries for any list_eks_* / get_eks_* / describe_eks_* action name.
The mapper dict contains entries for Grafana, Datadog, CloudWatch, S3,
Lambda, GitHub, Honeycomb, Coralogix and Vercel — EKS was missed.

As a result, successfully-executed EKS tools silently drop their data
on the floor: merge_evidence loops over execution_results, looks up
the action name in EVIDENCE_MAPPERS, gets None back, and never stores
the result in state["evidence"]. Downstream, diagnose_root_cause builds
its prompt from a state dict that has zero eks_* keys regardless of
what the tools returned. The agent effectively investigates Kubernetes
incidents without access to the data it just gathered.

This patch closes the gap for the five fixture-supported EKS tools:

* list_eks_pods     → eks_pods / eks_failing_pods / eks_high_restart_pods / eks_total_pods
* get_eks_events    → eks_events / eks_total_warning_count
* list_eks_deployments → eks_deployments / eks_degraded_deployments / eks_total_deployments
* get_eks_node_health  → eks_node_health / eks_not_ready_count / eks_total_nodes
* get_eks_pod_logs  → eks_pod_logs / eks_pod_logs_pod_name / eks_pod_logs_namespace

Each mapper follows the same shape as the existing Datadog mappers
(_map_datadog_logs, _map_datadog_monitors, etc.): read the fields the
corresponding tool function populates, write them to dedicated state
keys. Registered in the EVIDENCE_MAPPERS dict alongside the existing
entries.

build_evidence_summary() gets matching elif branches so the tracker
message ("eks:3 pods (0 failing)", "eks:2 deployments (1 degraded)",
...) mirrors the existing Datadog / Grafana branches.

Wiring the remaining six EKS tools (list_eks_clusters,
list_eks_namespaces, describe_eks_cluster, describe_eks_addon,
get_eks_deployment_status, get_eks_nodegroup_health) is deliberately
out of scope until a scenario needs them — same rationale as the scope
boundary in the parallel synthetic K8s harness work under Tracer-Cloud#260.

A new test module at tests/nodes/investigate/test_post_process.py
covers:

* each EKS action is registered in EVIDENCE_MAPPERS (5 cases)
* each mapper returns the expected keys when given a representative
  successful tool result
* each mapper returns sensible defaults when fields are missing
* merge_evidence() skips failed results (ActionExecutionResult with
  success=False) for EKS actions the same way it does for other tools
* build_evidence_summary() emits the expected human-readable strings
  for every EKS tool

Fixes Tracer-Cloud#581.
@greptile-apps

greptile-apps Bot commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR wires five EKS investigation tools into the evidence pipeline by adding mapper functions, EVIDENCE_MAPPERS registry entries, and build_evidence_summary branches for list_eks_pods, get_eks_events, list_eks_deployments, get_eks_node_health, and get_eks_pod_logs. All mapper field names have been verified against the corresponding tool return statements, defaults are sensible, and 19 new tests cover every new path. The change is fully additive and does not touch any existing mapper or summary branch.

Confidence Score: 5/5

Safe to merge — purely additive change with no modifications to existing code paths.

All five mapper field names were verified against the corresponding EKS tool return statements and match exactly. Defaults are sensible (empty list / 0). The EVIDENCE_MAPPERS registry entries and build_evidence_summary branches are consistent with each other and with existing patterns. 19 new tests cover mapper correctness, failed-result skipping, missing-fields defaults, and summary output. No existing mappers, keys, or code paths are modified.

No files require special attention.

Important Files Changed

Filename Overview
app/nodes/investigate/processing/post_process.py Adds 5 mapper functions, 5 EVIDENCE_MAPPERS entries, and 5 build_evidence_summary branches for EKS tools; all field names verified against tool source; no existing code paths modified.
tests/nodes/investigate/test_post_process.py 19 new tests covering mapper correctness, registry membership, failed-result skipping, missing-fields defaults, and summary output for all 5 EKS tools.
tests/nodes/investigate/init.py Empty init file to make the test directory a Python package; no logic.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[EKS Tool Execution] --> B{result.success?}
    B -- No --> C[Skip — no evidence stored]
    B -- Yes --> D{action_name in EVIDENCE_MAPPERS?}
    D -- No --> E[Silent drop — prior behaviour for EKS]
    D -- Yes --> F[mapper_fn called on result.data]
    F --> G[evidence.update with eks_* keys]
    G --> H[state evidence dict]

    subgraph "New mappers added by this PR"
        M1[list_eks_pods → _map_eks_pods]
        M2[get_eks_events → _map_eks_events]
        M3[list_eks_deployments → _map_eks_deployments]
        M4[get_eks_node_health → _map_eks_node_health]
        M5[get_eks_pod_logs → _map_eks_pod_logs]
    end

    F --> M1 & M2 & M3 & M4 & M5

    H --> I[diagnose_root_cause]
    H --> J[build_evidence_summary]
Loading

Reviews (1): Last reviewed commit: "fix: plumb EKS tool output into state ev..." | Re-trigger Greptile

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes missing post-processing plumbing for EKS investigation tools so their successful outputs are merged into state["evidence"] and reflected in the investigator’s evidence summary (Fixes #581).

Changes:

  • Add five EKS evidence mapper functions and register them in EVIDENCE_MAPPERS.
  • Extend build_evidence_summary() with EKS-specific summary branches.
  • Add unit tests covering EKS mapper registration, merge behavior, and summary rendering.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
app/nodes/investigate/processing/post_process.py Adds EKS mappers, registers them, and emits EKS collection summaries so tool output is preserved and visible downstream.
tests/nodes/investigate/test_post_process.py Adds focused tests verifying EKS evidence mapping/merging and evidence-summary strings.
tests/nodes/investigate/__init__.py Package marker for the new test module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

summary_parts.append(
f"eks:{data.get('total_nodes', 0)} nodes ({not_ready} not ready)"
)
elif action_name == "get_eks_pod_logs" and data.get("logs"):

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_evidence_summary() only emits the get_eks_pod_logs summary when data.get("logs") is truthy. Since the EKS tool returns logs as a string, a successful call that returns an empty string (valid when the container has no log output in the requested window) will currently produce no EKS summary entry, which makes the tracker output look like the tool never ran. Consider checking data.get("logs") is not None (or checking for the presence of the logs key) and emitting eks:0 log lines from <pod> when empty, to stay consistent with the other EKS branches that report zero counts.

Suggested change
elif action_name == "get_eks_pod_logs" and data.get("logs"):
elif action_name == "get_eks_pod_logs" and data.get("logs") is not None:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] EKS tool output silently dropped by merge_evidence — no mappers in post_process.py

3 participants