fix(bench, CI): calling real EKS during bench running by YauhenBichel · Pull Request #2756 · Tracer-Cloud/opensre

YauhenBichel · 2026-06-05T09:06:46Z

Fixes #2074 (fixing calling real EKS during bench running)

Describe the changes you have made in this PR -

Issues: still call real EKS during bench running:

June 5, 2026, 09:49 | Tool get_eks_events failed: AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:395261708130:cluster/train-ticket because no identity-based policy allows the eks:DescribeCluster action

Tool get_eks_events failed: AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:395261708130:cluster/cloudopsbench because no identity-based policy allows the eks:DescribeCluster action

June 5, 2026, 10:00 | botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:395261708130:cluster/train-ticket because no identity-based policy allows the eks:DescribeCluster action

June 5, 2026, 10:07 | Tool get_eks_events failed: ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::account:role

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Root cause. The CloudOpsBench investigation agent was reaching into PRODUCTION opensre tools (app/tools/EKSEventsTool/, app/tools/HermesLogsTool/, etc.) that hit live AWS / Hermes endpoints. The bench Fargate task role intentionally cannot reach those — bench cases are supposed to run against deterministic State-Snapshot replay data per the Cloud-OpsBench paper protocol. Both production tools and bench-package tools were visible to the agent because the registry merges them and the agent's tool_schemas payload included everything.

Fix.* One generic hook on the production agent, one override on the bench subclass:

File	Change
`app/agent/investigation.py`	New `_filter_tools(self, tools)` hook on `ConnectedInvestigationAgent`. Default returns input unchanged. Wired into `run()` between `_get_available_tools(resolved)` and `_build_connected_tool_context(resolved, tools)`. Filtered tools also disappear from `state["available_sources"]` / `state["available_action_names"]` — agent is not told sources exist that it can't reach. Zero behavior change for production.
`tests/benchmarks/cloudopsbench/bench_agent.py`	`BenchInvestigationAgent` overrides `_filter_tools` to keep only tools whose `origin_module` starts with `tests.benchmarks.cloudopsbench.tools.`. Whitelist is a `ClassVar[tuple[str, ...]]` (`ALLOWED_TOOL_MODULE_PREFIXES`) so a one-off experiment can override without rebuilding the agent — same convention as `MIN_TOOL_CALLS`.
`tests/benchmarks/cloudopsbench/test_bench_agent.py`	4 new tests: production default is identity, bench keeps only bench-package tools, bench returns empty when no bench tools registered (defensive), `ALLOWED_TOOL_MODULE_PREFIXES` is overridable.

Separation-of-concerns hygiene in the same PR (same conceptual change — production code should not know about benchmarking):

File	Change
`app/agent/investigation.py` (`_filter_tools` docstring)	Describes WHAT the hook does (filter tools, runs before state derivation) and WHEN to override it (allowlist policies). No mention of bench / replay / EKS / Hermes.
`app/tools/registry.py` (`register_external_tool_package` comment + docstring)	Removed "external benchmark harnesses" and "importing a bench package" examples. The mechanism is generic — production should not name a specific consumer when describing it.

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

github-actions · 2026-06-05T09:06:55Z

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

YauhenBichel · 2026-06-05T09:12:54Z

@greptile review

greptile-apps · 2026-06-05T09:17:15Z

Greptile Summary

This PR fixes bench runs incorrectly calling live AWS/EKS and Hermes endpoints by adding a _filter_tools hook to ConnectedInvestigationAgent and overriding it in BenchInvestigationAgent to allowlist only tools whose origin_module starts with the bench-package prefix — so production tools never reach the bench LLM loop.

app/agent/investigation.py: adds _filter_tools(tools) → tools (identity default) wired between _get_available_tools and _build_connected_tool_context, ensuring the filtered set also drives tool_schemas, seed calls, and state fields — zero production behavior change.
tests/benchmarks/cloudopsbench/bench_agent.py: overrides _filter_tools with an origin_module-prefix allowlist (ALLOWED_TOOL_MODULE_PREFIXES ClassVar), handles the empty-origin_module edge case with a WARNING log, and documents the trailing-dot exclusion in comments and tests.
app/tools/registry.py: removes bench-specific examples from docstrings to keep production code neutral about consumers.

Confidence Score: 5/5

Safe to merge — the hook is an identity no-op on the base class, so production investigations are unaffected; the bench subclass now consistently restricts the tool set to replay-only tools throughout the entire agent run.

The base-class _filter_tools returns its input unchanged, so every existing production code path continues without modification. The bench override is self-contained in the test tree, well-tested with five new unit tests, and applies uniformly to tool schemas, seed calls, and all parallel execution. No migration or schema changes are involved.

No files require special attention.

Important Files Changed

Filename	Overview
app/agent/investigation.py	Adds _filter_tools hook (identity default) wired between _get_available_tools and _build_connected_tool_context; filtered list flows through tool_schemas, seed calls, and all parallel execution — no production behavior change.
app/tools/registry.py	Comment-only cleanup: removes bench-specific examples from docstrings to keep production code neutral about consumers. No logic changed.
tests/benchmarks/cloudopsbench/bench_agent.py	Adds _filter_tools override that allowlists by origin_module prefix; edge cases (trailing-dot root, empty origin_module) are documented in comments and handled with a WARNING log; ALLOWED_TOOL_MODULE_PREFIXES ClassVar mirrors the MIN_TOOL_CALLS override convention.
tests/benchmarks/cloudopsbench/test_bench_agent.py	Adds 5 targeted unit tests covering: identity on base class, bench-only filtering, empty-toolset defensive path, prefix ClassVar override, trailing-dot root exclusion, and WARNING log on empty origin_module.

Sequence Diagram

sequenceDiagram
    participant Run as run()
    participant GAT as _get_available_tools()
    participant FT as _filter_tools()
    participant BCT as _build_connected_tool_context()
    participant LLM as LLM loop

    Run->>GAT: resolved_integrations
    GAT-->>Run: all available tools (prod + bench)
    Run->>FT: all available tools
    note over FT: Base class: identity (production)<br/>BenchAgent: keep origin_module<br/>startswith(ALLOWED_TOOL_MODULE_PREFIXES)
    FT-->>Run: filtered tools
    Run->>BCT: filtered tools
    BCT-->>Run: available_sources, available_action_names
    Run->>LLM: tool_schemas(filtered tools), seed_calls(filtered tools), _run_parallel(filtered tools)

_{Reviews (3): Last reviewed commit: "fixed greptile notes" | Re-trigger Greptile}

YauhenBichel · 2026-06-05T09:21:35Z

@greptile review

github-actions · 2026-06-05T09:26:24Z

🐸 Rebase? Handled. Conflicts? Squashed. CI? Vibing. @YauhenBichel touched the untouchable and lived. 🫡

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

fix calling read EKS during bench running

72f7a42

refactoring, code clean

e5baf67

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread tests/benchmarks/cloudopsbench/bench_agent.py Outdated

Comment thread tests/benchmarks/cloudopsbench/bench_agent.py Outdated

fixed greptile notes

5721285

YauhenBichel marked this pull request as ready for review June 5, 2026 09:25

YauhenBichel merged commit 2d70e77 into main Jun 5, 2026
17 checks passed

YauhenBichel deleted the fix/2074-prod-EKS--bench-calls-denied branch June 5, 2026 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bench, CI): calling real EKS during bench running#2756

fix(bench, CI): calling real EKS during bench running#2756
YauhenBichel merged 3 commits into
mainfrom
fix/2074-prod-EKS--bench-calls-denied

YauhenBichel commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

YauhenBichel commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

YauhenBichel commented Jun 5, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YauhenBichel commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the changes you have made in this PR -

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

github-actions Bot commented Jun 5, 2026

Greptile code review

Uh oh!

YauhenBichel commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

YauhenBichel commented Jun 5, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YauhenBichel commented Jun 5, 2026 •

edited

Loading

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading