Skip to content

[Bug] Docs and onboarding silently route users to the PI path instead of the Codex app-server runtime, causing chatgpt.com 403 in Cloudflare-sensitive environments #82978

@crhan

Description

@crhan

Draft: New issue to file against openclaw/openclaw

用途:作为独立 issue 提交。题目是 docs/onboarding 把用户引导到无法使用的路径。
#67670 范围更广(不限于 Cloudflare 403 那一种症状)。

直接拷贝 --- 之间作为 issue body。Title 写在最上面。

Title: [Bug] Docs and onboarding silently route users to the PI path instead of the Codex app-server runtime, causing chatgpt.com 403 in Cloudflare-sensitive environments


Summary

Following the canonical onboarding instructions in
docs/providers/openai.md and docs/plugins/codex-harness.md (as of
2026.5.12) lands the user on the PI runtime path — OpenClaw's
internal Node fetch (undici) directly calling
chatgpt.com/backend-api. This path fails with HTTP 403
cf-mitigated: challenge from any egress IP that hasn't been
whitelisted by OpenAI's Cloudflare config, which in practice means
every user in mainland China and many users behind commodity
VPN/proxy egress nodes.

There is a working alternative path (Codex app-server runtime:
codex/gpt-* + agentRuntime.id="codex" plus @openclaw/codex
plugin) that bypasses the issue entirely — OpenAI evidently
whitelists the official codex CLI's TLS profile. Unfortunately,
nothing in the docs or in the openclaw models auth login /
onboarding flow guides users toward this path. The user typically
discovers it only after several hours of debugging and reading
source code.

This is a docs↔implementation drift / onboarding completeness bug,
distinct from #67670 (which proposes adding TLS-fingerprint
emulation to the PI path).

Reproduction (no Cloudflare workaround required)

Tested on OpenClaw 2026.5.12, Linux x86_64, Node 25, behind a
mihomo HTTP proxy whose egress IP is in Singapore (AWS).

  1. Fresh install, follow docs/providers/openai.md Step 2 / 3:
    openclaw models auth login --provider openai-codex --device-code
    # (device-code flag is documented but not implemented in CLI yet,
    #  see drift item A below; if user falls back to plain login,
    #  the OAuth flow completes successfully)
    openclaw config set agents.defaults.model.primary openai-codex/gpt-5.5
    openclaw gateway restart
  2. Send a turn:
    openclaw agent --agent main --message "Reply PONG" --json

Expected (per docs Step 3 "OpenAI agent turns select the native
Codex app-server runtime automatically"):

  • agentHarnessId: "codex"
  • Real codex app-server subprocess visible in ps
  • 200 OK from chatgpt.com

Actual:

  • agentHarnessId: "pi"
  • No codex subprocess (ps -ef | grep codex empty during the turn)
  • `[openai-transport] [responses] error provider=codex
    api=openai-codex-responses model=gpt-5.5 status=403 message=403 ...cf-mitigated: challenge`
  • Turn falls through to whatever fallback chain is configured
    (often another provider that happens to succeed by accident,
    masking the failure)

Diagnosis

The openai-codex/* namespace exposed by the bundled openai
plugin (buildOpenAICodexProviderPlugin
https://chatgpt.com/backend-api/codex) is a PI-direct provider
implemented in OpenClaw's own Node fetch stack. It has the same
authentication source (the openai-codex:* profile in
auth-profiles.json) as the Codex app-server path, but a
completely separate transport. The openclaw models auth login --provider openai-codex command registers the OAuth profile but
does not:

  • install @openclaw/codex
  • mirror credentials to a per-agent
    ~/.openclaw/agents/<id>/agent/codex-home/auth.json
  • update agents.defaults.model to a value that triggers the
    Codex app-server runtime

So the post-login default state is "OAuth in place, but the only
ready provider is openai-codex/* which is PI-direct".

Documentation ↔ implementation drift checklist

Cataloguing what I tripped over while diagnosing. Some are docs
errors, some are missing implementation pieces:

# Drift Evidence
A openclaw models auth login --provider openai-codex --device-code documented but --device-code not recognized by the 2026.5.12 CLI Error: openclaw does not recognize option "--device-code"
B config set agents.defaults.model.primary openai/gpt-5.5 rejected with Model override "X" is not allowed for agent "Y" until the user separately writes agents.defaults.models["openai/gpt-5.5"] = {}. Not documented anywhere Reproducible on any fresh install
C docs/plugins/codex-harness.md:170 claims agentRuntime.id: "codex" is optional "for normal OpenAI auto mode", implying the system picks Codex automatically. In practice the auto-mode picker always selects PI unless agentRuntime.id is set explicitly See actual log signature in repro above
D Two parallel auth stores exist (auth-profiles.json for PI provider + per-agent codex-home/auth.json for codex app-server). Docs don't explain which one each runtime path reads Has to be discovered from source: extensions/codex/src/app-server/transport-stdio.ts (CODEX_HOME=per-agent) vs extensions/openai/src/codex-provider.ts (auth-profiles.json)
E openclaw models list shows codex/gpt-* (from @openclaw/codex plugin) but not openai/gpt-*, while docs prescribe openai/gpt-*. No documented alias mechanism The user can only set openai/gpt-* as primary if they manually add it to agents.defaults.models, but then no provider serves it
F docs/providers/openai.md Step 3: "OpenClaw installs or repairs the bundled Codex plugin when this route is chosen". No such auto-install happens during models auth login ~/.openclaw/npm/node_modules/@openclaw/codex remains absent after models auth login --provider openai-codex
G Schema rejects contextWindow / input keys under agents.defaults.models[X], so users cannot correct stale model metadata registered by a plugin (e.g. codex plugin currently registers codex/gpt-5.5 with contextWindow=200000, OpenAI's actual gpt-5.5 limit is 272k) config validate error: Unrecognized keys: "contextWindow", "input"
H openclaw models auth login --provider openai-codex doesn't chmod 600 the per-agent auth.json it would write (if it wrote one); users who mirror credentials manually have to remember to do it Minor, but a hardening gap

Items A, C, D, E, F together are why every careful user who reads
the docs end-to-end still ends up on the PI path.

Why this matters now

The PI path happens to work when the user's egress IP is in
OpenAI's accepted set — large ChatGPT account ranges (residential
ISPs in the US/EU) generally pass. The path fails for:

The codex/* + agentRuntime.id="codex" path is robust against all
of these because it uses the codex CLI's whitelisted client. Users
deserve to land on it by default.

Proposed fixes (high-level, open to direction)

In rough order of leverage:

  1. Consolidate openclaw models auth login --provider openai-codex
    into a one-shot Codex onboarding command
    that, in addition to
    the current OAuth profile write:
    • Installs @openclaw/codex if not already present (matches the
      docs Step 3 promise).
    • Mirrors the OAuth credentials to
      ~/.openclaw/agents/<default-agent>/agent/codex-home/auth.json
      (chmod 600).
    • Optionally (interactive) / informatively (non-interactive) sets
      agents.defaults.model.primary = codex/gpt-5.5 with
      agentRuntime.id: "codex" in the allow-list.
    • Prints a one-line verification: "Run openclaw agent ... PONG
      and look for agentHarnessId: codex".
  2. Implement the device-code flag (Drift A) in openclaw models auth login — it's already in docs.
  3. Make auto mode actually pick Codex when it's available (Drift
    C), or update the docs to say "auto mode does not automatically
    choose codex runtime; explicit agentRuntime.id: "codex" is
    required".
  4. Improve the Model override "X" is not allowed for agent "Y"
    error message
    to point at the agents.defaults.models
    allow-list (Drift B).
  5. Deprecate openai-codex/* as a directly-configurable provider
    ref
    (or at least emit a warning at gateway startup) since it's
    the PI path that 99% of users don't want.

I'm happy to file a PR for Fix #1 / #4 — I have a working
reproduction and the relevant source code annotated. The other
items I'll wait for maintainer direction.

Related

Environment

  • OpenClaw 2026.5.12 (f066dd2)
  • Linux 6.8.0 x86_64, Node 25.2.0
  • Codex CLI 0.130.0 (via @openclaw/codex plugin)
  • Egress: mihomo HTTP proxy → AWS Singapore

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions