fix(bench): seed ANTHROPIC_API_KEY in llm_alone dispatch acceptance test#2775
Conversation
Greptile code reviewThis repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md. Run a review — add a PR comment with: Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5. Optional: automate with the greploop skill. |
Greptile SummaryThis PR fixes a CI failure on fork PRs where
Confidence Score: 5/5Safe to merge — a one-line test setup addition with no production code changes. The change touches a single test, adds a dummy env var that is automatically cleaned up by pytest's monkeypatch fixture, and mirrors an already-proven pattern used elsewhere in the test suite. No production paths are affected, no real API calls can occur (run_investigation is patched out), and the other six tests in the file are unaffected. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant T as Test
participant MP as monkeypatch
participant R as BenchmarkRunner
participant D as LLMDispatcher.activate()
participant CI as run_investigation (patched)
T->>MP: setenv("ANTHROPIC_API_KEY", "test-key")
T->>R: BenchmarkRunner(config, adapter)
T->>R: run_without_integrity()
R->>R: _run_inner() pre-flight gate
R->>D: activate("claude-4-sonnet")
D->>D: Check ANTHROPIC_API_KEY ✓
D-->>R: spec context
R->>CI: run_investigation(...) [patched stub]
CI-->>R: "{root_cause: ok}"
R-->>T: outcome (not aborted)
T->>MP: teardown → unset ANTHROPIC_API_KEY
Reviews (1): Last reviewed commit: "fix(bench): seed ANTHROPIC_API_KEY in ll..." | Re-trigger Greptile |
|
@muddlebee @cerencamkiran kindly review |
The runner's pre-flight gate test exercises LLMDispatcher.activate(), which requires ANTHROPIC_API_KEY to be set even when the downstream LLM call is patched out. CI on main passes because the secret is injected from secrets.ANTHROPIC_API_KEY, but GitHub Actions strips secrets from fork-PR workflows, so any external contributor sees the test fail. Set a hermetic test-key via monkeypatch.setenv — same pattern as test_llm_dispatch.py uses 5+ times. Test is now hermetic whether ANTHROPIC_API_KEY is set or unset. Closes Tracer-Cloud#2774
0d3bdae to
3657dc1
Compare
|
🍵 @Davidson3556 made tea, opened a PR, and merged before it cooled. No notes. ☕ 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Fixes #2774
Describe the changes you have made in this PR -
test_run_inner_accepts_llm_alone_when_adapter_provides_baselinewas failing on every PR opened from a fork. The test callsrunner.run_without_integrity(), which activates the LLM viaLLMDispatcher.activate()intests/benchmarks/_framework/llm_dispatch.py:173-180. That activation raisesMissingAPIKeyifANTHROPIC_API_KEYis unset. CI on main passes because the secret is injected via.github/workflows/ci.yml:168,370, but GitHub Actions strips repository secrets from fork-PR workflows, so every external contributor's PR hits the same red.Change: add
monkeypatch: pytest.MonkeyPatchto the test and callmonkeypatch.setenv("ANTHROPIC_API_KEY", "test-key")before constructing the runner. Same patterntests/benchmarks/_framework/test_llm_dispatch.pyalready uses in 5+ places.The other 6 tests in the file are unaffected: they call
_run_one_celldirectly with an explicitspec=LLM_SPECS["claude-4-sonnet"]and never triggeractivate().Demo/Screenshot for feature changes and bug fixes -
Before (no key — reproduces the fork-PR CI failure):
After (no key, same command):
Also verified hermetic with a stub key (
ANTHROPIC_API_KEY=fake-keyset): 7 passed. The test is now green whether the env var is set or unset.Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
The test is verifying that the runner's pre-flight gate accepts an adapter that returns a non-None
baseline_agent_class. The downstream investigation pipeline is patched out viapatch("app.pipeline.runners.run_investigation", ...), but the dispatcher's environment-variable check fires before the patch is ever reached. So the test was broken by an upstream concern (env presence) that has nothing to do with what it's actually asserting.I considered three approaches:
pytest.mark.skipif(not os.getenv("ANTHROPIC_API_KEY")). Would skip the test on fork PRs but also silently drop it on any local dev run without a key. The point of the test is to gate the runner's pre-flight, and that gate should be exercised on every PR, not just ones where someone happens to have a key.ANTHROPIC_API_KEYto the workflow'senv:block unconditionally with a fake value. Too invasive: changes CI for every job, and the dummy key value would show up in workflow logs.monkeypatch.setenv("ANTHROPIC_API_KEY", "test-key")inside the test. Hermetic, scoped to the one test that needs it, automatically reverted after the test by pytest's monkeypatch fixture. Already the established pattern intests/benchmarks/_framework/test_llm_dispatch.py(5+ uses).I went with option 3. The dispatcher only checks env-var presence, not key validity, so a dummy
"test-key"value is enough. Withrun_investigationalready patched, no real Anthropic API call ever happens, so a fake key is safe.One thing I deliberately did not change: the dispatcher's
MissingAPIKeyexception itself. It is correct behavior for production runs (you want to fail loud if the key is missing). The bug was that one specific test was indirectly exercising that production check without the test setup it needed.Checklist before requesting a review