Skip to content

fix(auth): recover stale codex auth from CLI session#3279

Closed
lsaether wants to merge 1 commit into
NousResearch:mainfrom
lsaether:fix/codex-auth-cli-recovery
Closed

fix(auth): recover stale codex auth from CLI session#3279
lsaether wants to merge 1 commit into
NousResearch:mainfrom
lsaether:fix/codex-auth-cli-recovery

Conversation

@lsaether

Copy link
Copy Markdown
Contributor

Title
fix(auth): recover stale codex auth from CLI session

Summary

  • recover from a stale Hermes Codex session when ~/.codex/auth.json already contains a newer valid CLI session
  • mark raw 401 Codex refresh failures as relogin_required
  • add regression tests covering both the recovery path and the non-relogin failure path

Why
Hermes keeps its own Codex auth store in ~/.hermes/auth.json to avoid refresh-token rotation conflicts with Codex CLI. In practice, the two stores can drift:

  • Hermes can end up with an older stale refresh token
  • Codex CLI can still have a newer valid local session

Before this change, Hermes would try to refresh its stale session, fail, and force a manual re-login even though a valid Codex CLI session already existed on the machine.

What changed

  • in _refresh_codex_auth_tokens(), raw HTTP 401 responses now set relogin_required=True
  • in resolve_codex_runtime_credentials(), Hermes only attempts CLI-session recovery when the refresh failure actually indicates the Hermes session is invalid and requires re-authentication
  • recovery imports valid tokens from ~/.codex/auth.json into Hermes's auth store without writing back to the Codex CLI store
  • added a regression test to ensure Hermes does not recover from CLI tokens on non-relogin refresh failures

How to test

  1. Put an expiring/stale Codex token pair into ~/.hermes/auth.json
  2. Put a newer valid Codex token pair into ~/.codex/auth.json
  3. Trigger resolve_codex_runtime_credentials()
  4. Verify Hermes adopts the valid CLI access token instead of forcing a manual re-login
  5. Simulate a non-relogin refresh failure and verify Hermes does not fall back to CLI tokens

Test plan

  • python3 -m py_compile hermes_cli/auth.py tests/test_auth_codex_provider.py
  • uv run --with pytest --with pytest-xdist python -m pytest -q tests/test_auth_codex_provider.py

Platforms tested

  • Linux

Notes

  • this keeps Hermes's existing auth-store isolation model intact
  • Hermes still does not write to ~/.codex/auth.json
  • the CLI-session recovery is intentionally narrow and only runs for relogin-required refresh failures

@submit77

submit77 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Nice fix direction. This addresses the real stale-Codex-auth failure mode.

I tested a similar recovery path locally and found a few follow-up changes that would make this safer before merge:

  1. Validate imported ~/.codex/auth.json tokens before persisting them into Hermes

The current recovery path appears to persist imported Codex tokens into ~/.hermes/auth.json before proving they?re actually usable. That means a stale or revoked local Codex auth file could overwrite Hermes?s previous auth state and still fail on retry.

Safer pattern:

  • read ~/.codex/auth.json
  • validate the imported token pair in-memory first, ideally via refresh rather than only presence / JWT-expiry checks
  • only persist into ~/.hermes/auth.json after validation succeeds
  1. Keep recovery scoped to explicit invalid-local-auth cases

I?d keep the trigger narrow:

  • missing/invalid local Hermes auth
  • explicit invalid credential cases like invalid_grant / invalid_token

I?d avoid using broader backend failure classes as recovery triggers. Temporary auth-service outages or malformed backend responses are not evidence that Hermes should replace its local auth from another store.

  1. Update stale remediation text

A few auth messages still point users toward hermes login, but on current Hermes the repair path is:

  • hermes model
  • hermes setup model

Suggested regression tests:

  • external Codex auth exists but validation fails -> Hermes does not overwrite existing auth
  • generic refresh/backend failure does not trigger external recovery
  • original auth error is preserved when external Codex auth exists but validation fails

I prototyped this locally and verified the following behavior:

  • stale Hermes Codex auth + valid local Codex auth -> recovery succeeds
  • imported local Codex tokens are validated before persistence
  • generic backend refresh failures do not trigger external recovery
  • stale hermes login remediation text can be updated to hermes model / hermes setup model

I?m not planning to open a competing PR here, but wanted to leave the safety delta in one place in case it?s useful before merge.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the submission @lsaether. Closing as superseded — relied on the CLI auth-sync path removed by #12360.

Hermes's Codex auth design was reworked in #12360 ("Hermes owns its own Codex auth; stop touching ~/.codex/auth.json") to stop sharing refresh tokens with the Codex CLI / VS Code extension (they rotate on every use, so shared access caused refresh_token_reused races). Users who want to adopt Codex CLI credentials get a one-time explicit prompt via hermes auth openai-codex instead.

The valid adjacent fixes from this batch (error parsing, fallback chain on auth failure, reauth UX) landed together in #15104.

@teknium1 teknium1 closed this Apr 24, 2026
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard area/auth Authentication, OAuth, credential pools provider/openai OpenAI / Codex Responses API labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists provider/openai OpenAI / Codex Responses API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants