refactor(bench): _framework decoupling Phase 1 + 2 — capability flags by YauhenBichel · Pull Request #2800 · Tracer-Cloud/opensre

YauhenBichel · 2026-06-12T11:12:17Z

Fixes #2074

Describe the changes you have made in this PR -

First two of five planned phases to decouple _framework/ from
CloudOpsBench-specific assumptions. Today the framework hardcodes
if config.benchmark != "cloudopsbench" in multiple places — any new
adapter (OpenRCA, ToolCallBench) needs framework changes to opt in to
features it actually supports. This PR replaces name-based dispatch
with a capability flag layer.

Phase 1 — `AdapterCapabilities` model + ABC integration

New AdapterCapabilities pydantic model in _framework/adapter_base.py.
Frozen, extra="forbid". Two flags to start: supports_agent_variant,
supports_predictor_variant. Default all-False so a new adapter is
locked down to the minimum surface until it opts in deliberately.
BenchmarkAdapter ABC gets capabilities: ClassVar[AdapterCapabilities]
with all-False default.
CloudOpsBenchAdapter declares
capabilities = AdapterCapabilities(supports_agent_variant=True, supports_predictor_variant=True) — exactly what the previous
hardcoded check let through.

Phase 2 — Config validation goes capability-aware

New capabilities_for(name) helper in _framework/registry.py —
returns the registered adapter's flags without forcing the caller to
instantiate one.
_framework/config.py replaces both if benchmark != "cloudopsbench"
guards with if not adapter_caps.supports_<feature>. Unknown
adapter → all-False default → guard refuses the gated knob (a typo
in config.benchmark surfaces with a clear error for each gated
knob, plus the underlying unknown-benchmark error at runner build
time).

Backward compat

Existing CloudOpsBench configs (every *.yml in the repo) continue
to validate cleanly — the capability declaration on the adapter
matches the previous hardcoded behavior bit-for-bit.
The shim at _framework/adapters.py re-exports AdapterCapabilities
- capabilities_for so external imports use the same path as before.
Pydantic extra="forbid" catches capability-name typos at adapter
declaration time, not at run time.

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

github-actions · 2026-06-12T11:12:27Z

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

YauhenBichel · 2026-06-12T11:12:47Z

@greptile review

greptile-apps · 2026-06-12T11:18:13Z

Greptile Summary

This PR replaces two hardcoded if config.benchmark != "cloudopsbench" guards in _framework/config.py with a generic capability-flag layer, making the framework open to new adapters without framework changes. All three concerns raised in the prior review round (unnecessary adapter instantiation to read a ClassVar, misleading late-import circular-dependency comment, and the resulting _capabilities_for_lint shim) are cleanly resolved.

Phase 1 adds a frozen, extra-forbidden AdapterCapabilities Pydantic model and attaches it as a ClassVar to BenchmarkAdapter with an all-False safe default.
Phase 2 adds capabilities_for to the registry — reads the ClassVar directly for class factories (no instantiation) and falls back to factory().capabilities for closure factories — and threads it through _capabilities_for_lint in config.py, replacing both name-based guards.
CloudOpsBenchAdapter declares supports_agent_variant=True, supports_predictor_variant=True, reproducing the previous hardcoded behaviour bit-for-bit; existing configs continue to validate cleanly.

Confidence Score: 5/5

Safe to merge — the refactor is strictly additive, backward compatibility is preserved bit-for-bit, and all previous review concerns have been addressed.

The capability-flag layer is well-scoped: frozen Pydantic model prevents runtime mutation, extra="forbid" catches declaration typos at construction time, and capabilities_for avoids adapter instantiation for the common class-factory pattern. Existing CloudOpsBench configs validate identically to before. The only gap is the closure-factory fallback branch in capabilities_for having no test coverage, but this path is explicitly marked uncommon and affects no current adapter.

No files require special attention. test_adapter_capabilities.py could add a closure-factory test to complete coverage of the two-branch capabilities_for implementation.

Important Files Changed

Filename	Overview
tests/benchmarks/_framework/adapter_base.py	Adds `AdapterCapabilities` frozen Pydantic model with `extra="forbid"` and `ClassVar[AdapterCapabilities]` on the ABC — well-designed; safe default (all-False) correctly locks down new adapters.
tests/benchmarks/_framework/registry.py	New `capabilities_for` reads the `ClassVar` directly from the class when the factory IS the class (no instantiation); falls back to `factory().capabilities` for closure factories. Previous reviewer concern about unnecessary instantiation is fully resolved.
tests/benchmarks/_framework/config.py	Replaces both `benchmark != "cloudopsbench"` guards with capability lookups via `_capabilities_for_lint`. Previous review's misleading late-import comment is gone; `capabilities_for` is imported at module level. Logic is correct.
tests/benchmarks/_framework/adapters.py	Adds `AdapterCapabilities` and `capabilities_for` to `__all__` and re-exports — clean shim update, backward-compatible.
tests/benchmarks/_framework/tests/test_adapter_capabilities.py	Thorough tests for the new capability layer including the no-instantiation contract; closure-factory fallback path in `capabilities_for` has no test coverage.
tests/benchmarks/_framework/tests/test_config.py	Test names and assertions updated to match capability-based error messages; new passing tests for CloudOpsBench's declared capabilities — well-aligned with the refactor.
tests/benchmarks/cloudopsbench/adapter.py	Adds `capabilities = AdapterCapabilities(supports_agent_variant=True, supports_predictor_variant=True)` — exactly matches what the previous hardcoded checks allowed, preserving full backward compatibility.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["BenchmarkConfig.lint()"] --> B["_capabilities_for_lint(benchmark)"]
    B --> C["capabilities_for(name) — registry.py"]
    C --> D{Factory is a class?}
    D -- Yes --> E["factory.capabilities (ClassVar read, no __init__)"]
    D -- No --> F["factory().capabilities (closure fallback)"]
    E --> G["AdapterCapabilities"]
    F --> G
    C -- KeyError / ImportError --> H["AdapterCapabilities() — all-False default"]
    B --> G
    G --> I{agent_variant != 'default'?}
    I -- supports_agent_variant=False --> J["Error: adapter must declare supports_agent_variant=True"]
    I -- supports_agent_variant=True --> K["✓ Accepted"]
    G --> L{predictor_variant == 'structured'?}
    L -- supports_predictor_variant=False --> M["Error: adapter must declare supports_predictor_variant=True"]
    L -- supports_predictor_variant=True --> N["Check OpenAI LLM constraint"]

_{Reviews (3): Last reviewed commit: "fixed issues" | Re-trigger Greptile}

YauhenBichel · 2026-06-12T11:30:14Z

@greptile review

github-actions · 2026-06-12T11:37:00Z

🧑‍💻 @YauhenBichel has entered the contributor hall of fame. Merged. Done. Shipped. Go touch grass (then come back with another PR). 🌱

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

low coupling between bench frameworks and benchmarks

5e6c911

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/benchmarks/_framework/registry.py Outdated

Comment thread tests/benchmarks/_framework/config.py Outdated

Comment thread tests/benchmarks/_framework/config.py

fixed issues

037e2af

YauhenBichel marked this pull request as ready for review June 12, 2026 11:36

YauhenBichel merged commit 2d9b732 into main Jun 12, 2026
16 checks passed

YauhenBichel deleted the fix/2074-bench-low-coupling branch June 12, 2026 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(bench): _framework decoupling Phase 1 + 2 — capability flags#2800

refactor(bench): _framework decoupling Phase 1 + 2 — capability flags#2800
YauhenBichel merged 2 commits into
mainfrom
fix/2074-bench-low-coupling

YauhenBichel commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

YauhenBichel commented Jun 12, 2026

Uh oh!

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YauhenBichel commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YauhenBichel commented Jun 12, 2026

Describe the changes you have made in this PR -

Phase 1 — AdapterCapabilities model + ABC integration

Phase 2 — Config validation goes capability-aware

Backward compat

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

github-actions Bot commented Jun 12, 2026

Greptile code review

Uh oh!

YauhenBichel commented Jun 12, 2026

Uh oh!

greptile-apps Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YauhenBichel commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Phase 1 — `AdapterCapabilities` model + ABC integration

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading