Skip to content

fix(bench): seed ANTHROPIC_API_KEY in llm_alone dispatch acceptance test#2775

Merged
Devesh36 merged 1 commit into
Tracer-Cloud:mainfrom
Davidson3556:fix/bench-anthropic-api-key-test
Jun 9, 2026
Merged

fix(bench): seed ANTHROPIC_API_KEY in llm_alone dispatch acceptance test#2775
Devesh36 merged 1 commit into
Tracer-Cloud:mainfrom
Davidson3556:fix/bench-anthropic-api-key-test

Conversation

@Davidson3556

Copy link
Copy Markdown
Contributor

Fixes #2774

Describe the changes you have made in this PR -

test_run_inner_accepts_llm_alone_when_adapter_provides_baseline was failing on every PR opened from a fork. The test calls runner.run_without_integrity(), which activates the LLM via LLMDispatcher.activate() in tests/benchmarks/_framework/llm_dispatch.py:173-180. That activation raises MissingAPIKey if ANTHROPIC_API_KEY is unset. CI on main passes because the secret is injected via .github/workflows/ci.yml:168,370, but GitHub Actions strips repository secrets from fork-PR workflows, so every external contributor's PR hits the same red.

Change: add monkeypatch: pytest.MonkeyPatch to the test and call monkeypatch.setenv("ANTHROPIC_API_KEY", "test-key") before constructing the runner. Same pattern tests/benchmarks/_framework/test_llm_dispatch.py already uses in 5+ places.

The other 6 tests in the file are unaffected: they call _run_one_cell directly with an explicit spec=LLM_SPECS["claude-4-sonnet"] and never trigger activate().

Demo/Screenshot for feature changes and bug fixes -

Before (no key — reproduces the fork-PR CI failure):

image image

After (no key, same command):

image

Also verified hermetic with a stub key (ANTHROPIC_API_KEY=fake-key set): 7 passed. The test is now green whether the env var is set or unset.


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

The test is verifying that the runner's pre-flight gate accepts an adapter that returns a non-None baseline_agent_class. The downstream investigation pipeline is patched out via patch("app.pipeline.runners.run_investigation", ...), but the dispatcher's environment-variable check fires before the patch is ever reached. So the test was broken by an upstream concern (env presence) that has nothing to do with what it's actually asserting.

I considered three approaches:

  1. pytest.mark.skipif(not os.getenv("ANTHROPIC_API_KEY")). Would skip the test on fork PRs but also silently drop it on any local dev run without a key. The point of the test is to gate the runner's pre-flight, and that gate should be exercised on every PR, not just ones where someone happens to have a key.
  2. Add ANTHROPIC_API_KEY to the workflow's env: block unconditionally with a fake value. Too invasive: changes CI for every job, and the dummy key value would show up in workflow logs.
  3. monkeypatch.setenv("ANTHROPIC_API_KEY", "test-key") inside the test. Hermetic, scoped to the one test that needs it, automatically reverted after the test by pytest's monkeypatch fixture. Already the established pattern in tests/benchmarks/_framework/test_llm_dispatch.py (5+ uses).

I went with option 3. The dispatcher only checks env-var presence, not key validity, so a dummy "test-key" value is enough. With run_investigation already patched, no real Anthropic API call ever happens, so a fake key is safe.

One thing I deliberately did not change: the dispatcher's MissingAPIKey exception itself. It is correct behavior for production runs (you want to fail loud if the key is missing). The bug was that one specific test was indirectly exercising that production check without the test setup it needed.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a CI failure on fork PRs where test_run_inner_accepts_llm_alone_when_adapter_provides_baseline raised MissingAPIKey because ANTHROPIC_API_KEY is stripped from fork workflow environments by GitHub Actions. The fix adds monkeypatch.setenv("ANTHROPIC_API_KEY", "test-key") before runner.run_without_integrity() is called, matching the pattern already used in five-plus tests in test_llm_dispatch.py.

  • Adds monkeypatch: pytest.MonkeyPatch to the test signature and injects a dummy key so the dispatcher's env-check passes without triggering a real API call.
  • The patch is automatically reverted after the test by pytest's monkeypatch fixture, so there is no env leakage to other tests.

Confidence Score: 5/5

Safe to merge — a one-line test setup addition with no production code changes.

The change touches a single test, adds a dummy env var that is automatically cleaned up by pytest's monkeypatch fixture, and mirrors an already-proven pattern used elsewhere in the test suite. No production paths are affected, no real API calls can occur (run_investigation is patched out), and the other six tests in the file are unaffected.

No files require special attention.

Important Files Changed

Filename Overview
tests/benchmarks/_framework/test_runner_llm_alone_dispatch.py Adds monkeypatch.setenv("ANTHROPIC_API_KEY", "test-key") to the one test that reaches LLMDispatcher.activate(); the change is minimal, correctly scoped, and follows the established pattern in test_llm_dispatch.py.

Sequence Diagram

sequenceDiagram
    participant T as Test
    participant MP as monkeypatch
    participant R as BenchmarkRunner
    participant D as LLMDispatcher.activate()
    participant CI as run_investigation (patched)

    T->>MP: setenv("ANTHROPIC_API_KEY", "test-key")
    T->>R: BenchmarkRunner(config, adapter)
    T->>R: run_without_integrity()
    R->>R: _run_inner() pre-flight gate
    R->>D: activate("claude-4-sonnet")
    D->>D: Check ANTHROPIC_API_KEY ✓
    D-->>R: spec context
    R->>CI: run_investigation(...) [patched stub]
    CI-->>R: "{root_cause: ok}"
    R-->>T: outcome (not aborted)
    T->>MP: teardown → unset ANTHROPIC_API_KEY
Loading

Reviews (1): Last reviewed commit: "fix(bench): seed ANTHROPIC_API_KEY in ll..." | Re-trigger Greptile

@Davidson3556

Copy link
Copy Markdown
Contributor Author

@muddlebee @cerencamkiran kindly review

@psyberck psyberck mentioned this pull request Jun 8, 2026
13 tasks
The runner's pre-flight gate test exercises LLMDispatcher.activate(),
which requires ANTHROPIC_API_KEY to be set even when the downstream
LLM call is patched out. CI on main passes because the secret is
injected from secrets.ANTHROPIC_API_KEY, but GitHub Actions strips
secrets from fork-PR workflows, so any external contributor sees the
test fail. Set a hermetic test-key via monkeypatch.setenv — same
pattern as test_llm_dispatch.py uses 5+ times. Test is now hermetic
whether ANTHROPIC_API_KEY is set or unset.

Closes Tracer-Cloud#2774
@Davidson3556 Davidson3556 force-pushed the fix/bench-anthropic-api-key-test branch from 0d3bdae to 3657dc1 Compare June 8, 2026 19:57
@Devesh36 Devesh36 merged commit 1f5a454 into Tracer-Cloud:main Jun 9, 2026
14 checks passed
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🍵 @Davidson3556 made tea, opened a PR, and merged before it cooled. No notes. ☕


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] benchmark dispatch test fails on fork PRs (missing ANTHROPIC_API_KEY)

2 participants