Skip to content

test(conftest): isolate CODEX_HOME so token-refresh writeback doesn't leak between tests#10927

Closed
luigileap wants to merge 1 commit into
NousResearch:mainfrom
luigileap:fix/conftest-isolate-codex-home
Closed

test(conftest): isolate CODEX_HOME so token-refresh writeback doesn't leak between tests#10927
luigileap wants to merge 1 commit into
NousResearch:mainfrom
luigileap:fix/conftest-isolate-codex-home

Conversation

@luigileap

Copy link
Copy Markdown

Problem

Four tests fail intermittently under pytest-xdist whenever the
openai-codex token-refresh test happens to run on the same worker
before them:

FAILED tests/hermes_cli/test_auth_commands.py::test_auth_remove_accepts_label_target
FAILED tests/hermes_cli/test_auth_commands.py::test_auth_remove_prefers_exact_numeric_label_over_index
FAILED tests/hermes_cli/test_opencode_go_in_model_list.py::test_opencode_go_appears_when_api_key_set
FAILED tests/hermes_cli/test_overlay_slug_resolution.py::test_kimi_for_coding_overlay_uses_hermes_slug

Each one observes an extra phantom openai-codex credential it never
seeded. For example:

>   assert len(entries) == 1
E   AssertionError: assert 2 == 1

Root cause

Production _refresh_codex_auth_tokens writes refreshed OAuth tokens
back to $CODEX_HOME/auth.json (defaulting to Path.home() / ".codex" / "auth.json") so the Codex CLI / VS Code extension stay in
sync with Hermes:

# hermes_cli/auth.py
def _refresh_codex_auth_tokens(tokens, timeout_seconds):
    refreshed = refresh_codex_oauth_pure(...)
    _save_codex_tokens(updated_tokens)
    # Write back to ~/.codex/auth.json so Codex CLI / VS Code stay in sync.
    _write_codex_cli_tokens(refreshed["access_token"], refreshed["refresh_token"], ...)

tests/agent/test_credential_pool.py::test_try_refresh_current_updates_only_current_entry
mocks the inner refresh_codex_oauth_pure but not the outer
_refresh_codex_auth_tokens, so the writeback runs for real. The test
only sets HERMES_HOME, never CODEX_HOME, so
_write_codex_cli_tokens falls back to Path.home() / ".codex" / "auth.json".

Under the project's standard HOME=$(mktemp -d) test wrapper that HOME
directory is shared by every test in the run. The leaked
auth.json therefore persists for the rest of the session, and any
later test that calls load_pool("openai-codex") re-imports those
stale tokens via _seed_from_singletons
_import_codex_cli_tokens, ending up with the phantom credential.

Fix

Pin CODEX_HOME to a per-test scratch dir from the autouse
_isolate_hermes_home fixture, mirroring the existing HERMES_HOME
treatment:

fake_codex = tmp_path / "codex_test"
fake_codex.mkdir()
monkeypatch.setenv("CODEX_HOME", str(fake_codex))

The writeback now lands inside tmp_path, which pytest cleans up
between tests, so no later test ever sees the file. Add a regression
test in tests/agent/test_credential_pool.py that exercises
pool.try_refresh_current() for openai-codex and asserts
Path.home() / ".codex" / "auth.json" is not created.

Tests that need a specific CODEX_HOME continue to override it via
monkeypatch.setenv(...) as before.

Verification

$ python -m pytest tests/ -q
...
11270 passed, 79 skipped in 187.13s

(All four originally failing tests pass under both serial and xdist
runs; the new regression test passes; nothing in the rest of the
suite regresses.)

…into Path.home()

Production ``_refresh_codex_auth_tokens`` writes refreshed OAuth tokens
back to ``$CODEX_HOME/auth.json`` (defaulting to ``Path.home() /
".codex" / "auth.json"``) so the Codex CLI / VS Code extension stay in
sync.  When a test triggers a refresh without explicitly setting
``CODEX_HOME``, the writeback lands in the test process's *real* HOME.

Under the project's standard ``HOME=$(mktemp -d)`` test wrapper that
HOME directory is shared by every test in the run, so the file persists
for the remainder of the session.  Any later test that calls
``load_pool("openai-codex")`` then re-imports those stale tokens via
``_seed_from_singletons`` -> ``_import_codex_cli_tokens``, and ends up
with an extra phantom credential in the pool.

Concretely this caused four tests to fail intermittently under xdist
when ``test_try_refresh_current_updates_only_current_entry`` ran on the
same worker before any of:

  - tests/hermes_cli/test_auth_commands.py::test_auth_remove_accepts_label_target
  - tests/hermes_cli/test_auth_commands.py::test_auth_remove_prefers_exact_numeric_label_over_index
  - tests/hermes_cli/test_opencode_go_in_model_list.py::test_opencode_go_appears_when_api_key_set
  - tests/hermes_cli/test_overlay_slug_resolution.py::test_kimi_for_coding_overlay_uses_hermes_slug

Pin ``CODEX_HOME`` to a per-test scratch dir from the autouse
``_isolate_hermes_home`` fixture so writebacks are always contained to
``tmp_path`` and cleaned up between tests.  Mirrors the existing
HERMES_HOME treatment.  Add a regression test that exercises
``pool.try_refresh_current()`` for ``openai-codex`` and asserts
``Path.home() / ".codex" / "auth.json"`` is not created.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the submission @luigileap. Closing as superseded — the token-refresh writeback that caused cross-test leakage was removed by #12360; CODEX_HOME is no longer touched during refresh.

Hermes's Codex auth design was reworked in #12360 ("Hermes owns its own Codex auth; stop touching ~/.codex/auth.json") to stop sharing refresh tokens with the Codex CLI / VS Code extension (they rotate on every use, so shared access caused refresh_token_reused races). Users who want to adopt Codex CLI credentials get a one-time explicit prompt via hermes auth openai-codex instead.

The valid adjacent fixes from this batch (error parsing, fallback chain on auth failure, reauth UX) landed together in #15104.

@teknium1 teknium1 closed this Apr 24, 2026
@alt-glitch alt-glitch added type/test Test coverage or test infrastructure P3 Low — cosmetic, nice to have area/auth Authentication, OAuth, credential pools provider/openai OpenAI / Codex Responses API labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools P3 Low — cosmetic, nice to have provider/openai OpenAI / Codex Responses API type/test Test coverage or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants