Summary
EKS investigation tools (list_eks_pods, get_eks_events, list_eks_deployments, get_eks_node_health, get_eks_pod_logs, and others under app/tools/EKS*/) execute successfully and return correctly-shaped result dicts, but their output is silently dropped by merge_evidence() in app/nodes/investigate/processing/post_process.py because the EVIDENCE_MAPPERS registry has no entries for any list_eks_* / get_eks_* / describe_eks_* action name. As a result, state["evidence"] never gets any eks_* keys, and diagnose_root_cause builds its prompt without ever seeing pod status, events, deployment health, or node conditions that were gathered earlier in the investigation.
Expected vs actual behavior
Expected: When list_eks_pods runs during an investigation and returns its standard envelope
{
"source": "eks",
"available": True,
"cluster_name": "...",
"namespace": "...",
"total_pods": 3,
"pods": [...],
"failing_pods": [...],
"high_restart_pods": [...],
"error": None,
}
the relevant fields should be written into state["evidence"] under dedicated keys (eks_pods, eks_failing_pods, eks_high_restart_pods, eks_total_pods), matching the pattern the Grafana and Datadog tools already follow via _map_grafana_logs, _map_grafana_metrics, _map_datadog_logs, _map_datadog_monitors, etc. diagnose_root_cause should then see EKS evidence when building its diagnosis prompt.
Actual: merge_evidence() at app/nodes/investigate/processing/post_process.py:346-360 looks up each executed action name in the EVIDENCE_MAPPERS dict:
mapper = EVIDENCE_MAPPERS.get(action_name)
if mapper:
evidence.update(mapper(result.data))
If mapper is None, the result is simply not merged into evidence. There is no fall-through that stores the raw dict. The EVIDENCE_MAPPERS dict contains entries for the Grafana toolset, the Datadog toolset, get_cloudwatch_logs, S3 tools, Lambda tools, GitHub tools, Honeycomb, Coralogix, and Vercel — but no entries for any of the 11 EKS tools (describe_eks_addon, describe_eks_cluster, get_eks_deployment_status, get_eks_events, get_eks_node_health, get_eks_nodegroup_health, get_eks_pod_logs, list_eks_clusters, list_eks_deployments, list_eks_namespaces, list_eks_pods).
build_evidence_summary() at post_process.py:395-474 has the same gap: it has dedicated elif branches for Grafana, Datadog, S3, Lambda, CloudWatch, Honeycomb, Coralogix, and diagnostic-code tools, but none for EKS. The tracker message emitted at the end of node_investigate therefore never reports anything from EKS runs, even when the tools executed and returned data.
Downstream consequence: diagnose_root_cause reads state["evidence"] to build its prompt. Because the EKS tools' output was discarded, the agent effectively investigates every Kubernetes incident without access to the data it just gathered. It will diagnose as if no EKS telemetry existed — the opposite of the behavior the EKS toolset was added for.
Steps to reproduce
-
Clone the repo and install dev dependencies: pip install -e ".[dev]".
-
Run the following check in a Python shell from the repo root:
from app.nodes.investigate.processing.post_process import EVIDENCE_MAPPERS
eks_action_prefixes = ("list_eks", "get_eks", "describe_eks")
eks_mappers = [k for k in EVIDENCE_MAPPERS if k.startswith(eks_action_prefixes)]
datadog_mappers = [k for k in EVIDENCE_MAPPERS if k.startswith("query_datadog")]
grafana_mappers = [k for k in EVIDENCE_MAPPERS if k.startswith("query_grafana")]
print(f"EKS mappers registered: {eks_mappers}")
print(f"Datadog mappers registered: {datadog_mappers}")
print(f"Grafana mappers registered: {grafana_mappers}")
-
Observe the output:
EKS mappers registered: []
Datadog mappers registered: ['query_datadog_logs', 'query_datadog_monitors', 'query_datadog_events', 'query_datadog_all']
Grafana mappers registered: ['query_grafana_logs', 'query_grafana_traces', 'query_grafana_metrics', 'query_grafana_alert_rules', 'query_grafana_service_names']
EKS is the only major evidence family with zero mappers.
-
Integration-level reproducer: run any investigation flow in which the planner selects list_eks_pods (for example a Kubernetes alert where the aws integration supplies role_arn and the alert annotations include cluster_name). Log state["evidence"] after node_investigate returns. No eks_* keys will appear regardless of what the tool returned.
Can you reproduce it consistently?
Yes
How often does it occur?
Every time
Operating system
Linux
Logs and error output
N/A — the failure is silent. The EKS tool itself logs a line like
[eks] list_eks_pods cluster=payments-prod-eks ns=payments
and returns its populated dict successfully. The drop happens in the post-processing layer, which emits no log when a mapper is missing. This is a large part of why the gap has been easy to miss.
Additional context
Proposed fix (additive, low-risk, suitable for a first-time contributor — follows the exact pattern of the existing _map_datadog_logs / _map_datadog_monitors mappers):
-
Add five mapper functions to app/nodes/investigate/processing/post_process.py:
def _map_eks_pods(data: dict) -> dict:
return {
"eks_pods": data.get("pods", []),
"eks_failing_pods": data.get("failing_pods", []),
"eks_high_restart_pods": data.get("high_restart_pods", []),
"eks_total_pods": data.get("total_pods", 0),
}
def _map_eks_events(data: dict) -> dict:
return {
"eks_events": data.get("warning_events", []),
"eks_total_warning_count": data.get("total_warning_count", 0),
}
def _map_eks_deployments(data: dict) -> dict:
return {
"eks_deployments": data.get("deployments", []),
"eks_degraded_deployments": data.get("degraded_deployments", []),
"eks_total_deployments": data.get("total_deployments", 0),
}
def _map_eks_node_health(data: dict) -> dict:
return {
"eks_node_health": data.get("nodes", []),
"eks_not_ready_count": data.get("not_ready_count", 0),
"eks_total_nodes": data.get("total_nodes", 0),
}
def _map_eks_pod_logs(data: dict) -> dict:
return {
"eks_pod_logs": data.get("logs", ""),
"eks_pod_logs_pod_name": data.get("pod_name", ""),
"eks_pod_logs_namespace": data.get("namespace", ""),
}
-
Register each in the EVIDENCE_MAPPERS dict alongside the existing entries:
"list_eks_pods": _map_eks_pods,
"get_eks_events": _map_eks_events,
"list_eks_deployments": _map_eks_deployments,
"get_eks_node_health": _map_eks_node_health,
"get_eks_pod_logs": _map_eks_pod_logs,
-
Add corresponding elif branches to build_evidence_summary() so the tracker reports EKS activity in the same "source:count ..." style the Grafana and Datadog branches use. Example:
elif action_name == "list_eks_pods" and data.get("pods") is not None:
failing = len(data.get("failing_pods", []))
summary_parts.append(f"eks:{data.get('total_pods', 0)} pods ({failing} failing)")
elif action_name == "get_eks_events" and data.get("warning_events") is not None:
summary_parts.append(f"eks:{data.get('total_warning_count', 0)} warning events")
# ... one branch per EKS tool
-
Add a unit test that feeds a canned ActionExecutionResult(success=True, data={...}) into merge_evidence for each of the five EKS action names and asserts the expected keys appear in the returned evidence dict. There is no existing test file for post_process.py yet — creating tests/nodes/investigate/test_post_process.py with parametrized cases (one per tool) is a reasonable entry point. The existing Datadog mapper tests (if any) can be used as a reference; otherwise the pattern is straightforward.
Why this matters now: the Kubernetes synthetic test harness being built under #260 assumes EKS tool output flows into state["evidence"] before diagnose_root_cause runs. The follow-up scenario issues #261, #262, #263 will declare required_evidence_sources: [eks_pods, eks_events, ...] in their answer.yml files and will unfairly fail the scorer's evidence check regardless of whether the agent called the right tools, until this is fixed.
Related: #260, #261, #262, #263. Scope of this fix: the five fixture-supported tools above. Wiring the remaining six EKS tools (list_eks_clusters, list_eks_namespaces, describe_eks_cluster, describe_eks_addon, get_eks_deployment_status, get_eks_nodegroup_health) is out of scope for this issue and can be handled in a separate follow-up once scenarios that need them exist.
Summary
EKS investigation tools (
list_eks_pods,get_eks_events,list_eks_deployments,get_eks_node_health,get_eks_pod_logs, and others underapp/tools/EKS*/) execute successfully and return correctly-shaped result dicts, but their output is silently dropped bymerge_evidence()inapp/nodes/investigate/processing/post_process.pybecause theEVIDENCE_MAPPERSregistry has no entries for anylist_eks_*/get_eks_*/describe_eks_*action name. As a result,state["evidence"]never gets anyeks_*keys, anddiagnose_root_causebuilds its prompt without ever seeing pod status, events, deployment health, or node conditions that were gathered earlier in the investigation.Expected vs actual behavior
Expected: When
list_eks_podsruns during an investigation and returns its standard envelope{ "source": "eks", "available": True, "cluster_name": "...", "namespace": "...", "total_pods": 3, "pods": [...], "failing_pods": [...], "high_restart_pods": [...], "error": None, }the relevant fields should be written into
state["evidence"]under dedicated keys (eks_pods,eks_failing_pods,eks_high_restart_pods,eks_total_pods), matching the pattern the Grafana and Datadog tools already follow via_map_grafana_logs,_map_grafana_metrics,_map_datadog_logs,_map_datadog_monitors, etc.diagnose_root_causeshould then see EKS evidence when building its diagnosis prompt.Actual:
merge_evidence()atapp/nodes/investigate/processing/post_process.py:346-360looks up each executed action name in theEVIDENCE_MAPPERSdict:If
mapperisNone, the result is simply not merged into evidence. There is no fall-through that stores the raw dict. TheEVIDENCE_MAPPERSdict contains entries for the Grafana toolset, the Datadog toolset,get_cloudwatch_logs, S3 tools, Lambda tools, GitHub tools, Honeycomb, Coralogix, and Vercel — but no entries for any of the 11 EKS tools (describe_eks_addon,describe_eks_cluster,get_eks_deployment_status,get_eks_events,get_eks_node_health,get_eks_nodegroup_health,get_eks_pod_logs,list_eks_clusters,list_eks_deployments,list_eks_namespaces,list_eks_pods).build_evidence_summary()atpost_process.py:395-474has the same gap: it has dedicatedelifbranches for Grafana, Datadog, S3, Lambda, CloudWatch, Honeycomb, Coralogix, and diagnostic-code tools, but none for EKS. The tracker message emitted at the end ofnode_investigatetherefore never reports anything from EKS runs, even when the tools executed and returned data.Downstream consequence:
diagnose_root_causereadsstate["evidence"]to build its prompt. Because the EKS tools' output was discarded, the agent effectively investigates every Kubernetes incident without access to the data it just gathered. It will diagnose as if no EKS telemetry existed — the opposite of the behavior the EKS toolset was added for.Steps to reproduce
Clone the repo and install dev dependencies:
pip install -e ".[dev]".Run the following check in a Python shell from the repo root:
Observe the output:
EKS is the only major evidence family with zero mappers.
Integration-level reproducer: run any investigation flow in which the planner selects
list_eks_pods(for example a Kubernetes alert where theawsintegration suppliesrole_arnand the alert annotations includecluster_name). Logstate["evidence"]afternode_investigatereturns. Noeks_*keys will appear regardless of what the tool returned.Can you reproduce it consistently?
Yes
How often does it occur?
Every time
Operating system
Linux
Logs and error output
N/A — the failure is silent. The EKS tool itself logs a line like
and returns its populated dict successfully. The drop happens in the post-processing layer, which emits no log when a mapper is missing. This is a large part of why the gap has been easy to miss.
Additional context
Proposed fix (additive, low-risk, suitable for a first-time contributor — follows the exact pattern of the existing
_map_datadog_logs/_map_datadog_monitorsmappers):Add five mapper functions to
app/nodes/investigate/processing/post_process.py:Register each in the
EVIDENCE_MAPPERSdict alongside the existing entries:Add corresponding
elifbranches tobuild_evidence_summary()so the tracker reports EKS activity in the same"source:count ..."style the Grafana and Datadog branches use. Example:Add a unit test that feeds a canned
ActionExecutionResult(success=True, data={...})intomerge_evidencefor each of the five EKS action names and asserts the expected keys appear in the returned evidence dict. There is no existing test file forpost_process.pyyet — creatingtests/nodes/investigate/test_post_process.pywith parametrized cases (one per tool) is a reasonable entry point. The existing Datadog mapper tests (if any) can be used as a reference; otherwise the pattern is straightforward.Why this matters now: the Kubernetes synthetic test harness being built under #260 assumes EKS tool output flows into
state["evidence"]beforediagnose_root_causeruns. The follow-up scenario issues #261, #262, #263 will declarerequired_evidence_sources: [eks_pods, eks_events, ...]in theiranswer.ymlfiles and will unfairly fail the scorer's evidence check regardless of whether the agent called the right tools, until this is fixed.Related: #260, #261, #262, #263. Scope of this fix: the five fixture-supported tools above. Wiring the remaining six EKS tools (
list_eks_clusters,list_eks_namespaces,describe_eks_cluster,describe_eks_addon,get_eks_deployment_status,get_eks_nodegroup_health) is out of scope for this issue and can be handled in a separate follow-up once scenarios that need them exist.