Skip to content

fix(bench, CI): calling real EKS during bench running#2756

Merged
YauhenBichel merged 3 commits into
mainfrom
fix/2074-prod-EKS--bench-calls-denied
Jun 5, 2026
Merged

fix(bench, CI): calling real EKS during bench running#2756
YauhenBichel merged 3 commits into
mainfrom
fix/2074-prod-EKS--bench-calls-denied

Conversation

@YauhenBichel

@YauhenBichel YauhenBichel commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Fixes #2074 (fixing calling real EKS during bench running)

Describe the changes you have made in this PR -

Issues: still call real EKS during bench running:

June 5, 2026, 09:49 | Tool get_eks_events failed: AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:395261708130:cluster/train-ticket because no identity-based policy allows the eks:DescribeCluster action
Tool get_eks_events failed: AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:395261708130:cluster/cloudopsbench because no identity-based policy allows the eks:DescribeCluster action
June 5, 2026, 10:00 | botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:395261708130:cluster/train-ticket because no identity-based policy allows the eks:DescribeCluster action
June 5, 2026, 10:07 | Tool get_eks_events failed: ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::395261708130:assumed-role/opensre-bench-task/76e3713611064ea0b1ea69ab0f09ba0b is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::account:role


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Root cause. The CloudOpsBench investigation agent was reaching into PRODUCTION opensre tools (app/tools/EKSEventsTool/, app/tools/HermesLogsTool/, etc.) that hit live AWS / Hermes endpoints. The bench Fargate task role intentionally cannot reach those — bench cases are supposed to run against deterministic State-Snapshot replay data per the Cloud-OpsBench paper protocol. Both production tools and bench-package tools were visible to the agent because the registry merges them and the agent's tool_schemas payload included everything.

Fix.* One generic hook on the production agent, one override on the bench subclass:

File Change
app/agent/investigation.py New _filter_tools(self, tools) hook on ConnectedInvestigationAgent. Default returns input unchanged. Wired into run() between _get_available_tools(resolved) and _build_connected_tool_context(resolved, tools). Filtered tools also disappear from state["available_sources"] / state["available_action_names"] — agent is not told sources exist that it can't reach. Zero behavior change for production.
tests/benchmarks/cloudopsbench/bench_agent.py BenchInvestigationAgent overrides _filter_tools to keep only tools whose origin_module starts with tests.benchmarks.cloudopsbench.tools.. Whitelist is a ClassVar[tuple[str, ...]] (ALLOWED_TOOL_MODULE_PREFIXES) so a one-off experiment can override without rebuilding the agent — same convention as MIN_TOOL_CALLS.
tests/benchmarks/cloudopsbench/test_bench_agent.py 4 new tests: production default is identity, bench keeps only bench-package tools, bench returns empty when no bench tools registered (defensive), ALLOWED_TOOL_MODULE_PREFIXES is overridable.

Separation-of-concerns hygiene in the same PR (same conceptual change — production code should not know about benchmarking):

File Change
app/agent/investigation.py (_filter_tools docstring) Describes WHAT the hook does (filter tools, runs before state derivation) and WHEN to override it (allowlist policies). No mention of bench / replay / EKS / Hermes.
app/tools/registry.py (register_external_tool_package comment + docstring) Removed "external benchmark harnesses" and "importing a bench package" examples. The mechanism is generic — production should not name a specific consumer when describing it.

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@YauhenBichel

Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes bench runs incorrectly calling live AWS/EKS and Hermes endpoints by adding a _filter_tools hook to ConnectedInvestigationAgent and overriding it in BenchInvestigationAgent to allowlist only tools whose origin_module starts with the bench-package prefix — so production tools never reach the bench LLM loop.

  • app/agent/investigation.py: adds _filter_tools(tools) → tools (identity default) wired between _get_available_tools and _build_connected_tool_context, ensuring the filtered set also drives tool_schemas, seed calls, and state fields — zero production behavior change.
  • tests/benchmarks/cloudopsbench/bench_agent.py: overrides _filter_tools with an origin_module-prefix allowlist (ALLOWED_TOOL_MODULE_PREFIXES ClassVar), handles the empty-origin_module edge case with a WARNING log, and documents the trailing-dot exclusion in comments and tests.
  • app/tools/registry.py: removes bench-specific examples from docstrings to keep production code neutral about consumers.

Confidence Score: 5/5

Safe to merge — the hook is an identity no-op on the base class, so production investigations are unaffected; the bench subclass now consistently restricts the tool set to replay-only tools throughout the entire agent run.

The base-class _filter_tools returns its input unchanged, so every existing production code path continues without modification. The bench override is self-contained in the test tree, well-tested with five new unit tests, and applies uniformly to tool schemas, seed calls, and all parallel execution. No migration or schema changes are involved.

No files require special attention.

Important Files Changed

Filename Overview
app/agent/investigation.py Adds _filter_tools hook (identity default) wired between _get_available_tools and _build_connected_tool_context; filtered list flows through tool_schemas, seed calls, and all parallel execution — no production behavior change.
app/tools/registry.py Comment-only cleanup: removes bench-specific examples from docstrings to keep production code neutral about consumers. No logic changed.
tests/benchmarks/cloudopsbench/bench_agent.py Adds _filter_tools override that allowlists by origin_module prefix; edge cases (trailing-dot root, empty origin_module) are documented in comments and handled with a WARNING log; ALLOWED_TOOL_MODULE_PREFIXES ClassVar mirrors the MIN_TOOL_CALLS override convention.
tests/benchmarks/cloudopsbench/test_bench_agent.py Adds 5 targeted unit tests covering: identity on base class, bench-only filtering, empty-toolset defensive path, prefix ClassVar override, trailing-dot root exclusion, and WARNING log on empty origin_module.

Sequence Diagram

sequenceDiagram
    participant Run as run()
    participant GAT as _get_available_tools()
    participant FT as _filter_tools()
    participant BCT as _build_connected_tool_context()
    participant LLM as LLM loop

    Run->>GAT: resolved_integrations
    GAT-->>Run: all available tools (prod + bench)
    Run->>FT: all available tools
    note over FT: Base class: identity (production)<br/>BenchAgent: keep origin_module<br/>startswith(ALLOWED_TOOL_MODULE_PREFIXES)
    FT-->>Run: filtered tools
    Run->>BCT: filtered tools
    BCT-->>Run: available_sources, available_action_names
    Run->>LLM: tool_schemas(filtered tools), seed_calls(filtered tools), _run_parallel(filtered tools)
Loading

Reviews (3): Last reviewed commit: "fixed greptile notes" | Re-trigger Greptile

Comment thread tests/benchmarks/cloudopsbench/bench_agent.py Outdated
Comment thread tests/benchmarks/cloudopsbench/bench_agent.py Outdated
@YauhenBichel

Copy link
Copy Markdown
Collaborator Author

@greptile review

@YauhenBichel YauhenBichel marked this pull request as ready for review June 5, 2026 09:25
@YauhenBichel YauhenBichel merged commit 2d70e77 into main Jun 5, 2026
17 checks passed
@YauhenBichel YauhenBichel deleted the fix/2074-prod-EKS--bench-calls-denied branch June 5, 2026 09:26
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🐸 Rebase? Handled. Conflicts? Squashed. CI? Vibing. @YauhenBichel touched the untouchable and lived. 🫡


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark opensre+LLM vs LLM-alone (Cloudopsbench)

1 participant