Skip to content

fix(doctor): avoid impossible device token rotation advice#77688

Merged
obviyus merged 3 commits intoopenclaw:mainfrom
Conan-Scott:fix/doctor-stale-local-device-auth-advice
May 5, 2026
Merged

fix(doctor): avoid impossible device token rotation advice#77688
obviyus merged 3 commits intoopenclaw:mainfrom
Conan-Scott:fix/doctor-stale-local-device-auth-advice

Conversation

@Conan-Scott
Copy link
Copy Markdown
Contributor

@Conan-Scott Conan-Scott commented May 5, 2026

Summary

  • Problem: openclaw doctor could advise openclaw devices rotate --role <role> for a stale local cached device-auth role that the gateway pairing record no longer approves.
  • Why it matters: the suggested rotate command is impossible in that state and fails with device token rotation denied, sending the user down the wrong recovery path.
  • What changed: when local cached auth has no matching active gateway token and the role is no longer approved, doctor now says that role is no longer approved and recommends reconnecting shared gateway auth or removing the stale cached role entry.
  • Changelog: added an Unreleased/Fixes entry for the doctor device-pairing advice change.
  • What did NOT change (scope boundary): rotate advice is still used for approved-role stale-token and stale-scope cases where the gateway pairing record has an active/approved role and rotation can succeed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the local device-auth doctor check built rotate guidance before distinguishing between “approved role with no active token” and “local cached role that is no longer approved by the gateway pairing record.”
  • Missing detection / guardrail: there was no regression test for a local cache containing a stale role, such as cached node auth on a device currently approved only as operator.
  • Contributing context (if known): the gateway correctly denies rotation for an unapproved role, so the bug is in doctor’s recovery advice rather than in the server-side authorization decision.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/commands/doctor-device-pairing.test.ts
  • Scenario the test should lock in: local identity/device-auth.json has cached node auth for a device whose gateway pairing record is approved only as operator; doctor warns about the stale local auth but does not suggest --role node rotation.
  • Why this is the smallest reliable guardrail: the bug is pure doctor message selection based on the local cache plus pairing snapshot, so the existing command-level test harness covers it without needing live gateway/device setup.
  • Existing test that already covers this (if any): none for the unapproved cached-role case.
  • If no new test is added, why not: N/A — this PR adds one.

User-visible / Behavior Changes

openclaw doctor gives different recovery advice for stale local cached device auth when the cached role is no longer approved. Instead of recommending an impossible token rotation, it says the role is no longer approved and suggests reconnecting shared gateway auth or removing the stale cached role entry.

Diagram (if applicable)

Before:
local cache has stale node role + gateway approves operator only
  -> doctor suggests rotate --role node
  -> gateway denies rotation

After:
local cache has stale node role + gateway approves operator only
  -> doctor says node role is no longer approved
  -> user is directed to refresh/remove stale local auth

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux container / gateway pod
  • Runtime/container: OpenClaw CLI / doctor command path
  • Model/provider: N/A
  • Integration/channel (if any): Device pairing / local CLI identity
  • Relevant config (redacted): local identity/device-auth.json with cached node role; gateway pairing record for same device approved as operator only

Steps

  1. Pair a local CLI/control device as operator only.
  2. Leave or create a stale local cached node token in identity/device-auth.json for the same device.
  3. Run openclaw doctor.

Expected

  • Doctor identifies the stale local cached node auth.
  • Doctor does not suggest openclaw devices rotate --role node, because that role is not approved and rotation will be denied.

Actual

  • Before this PR, doctor suggested openclaw devices rotate --role node even though the gateway pairing record did not approve node for that device.
  • Running the suggested command failed with device token rotation denied.

Real behavior proof

  • Behavior or issue addressed: openclaw doctor should not suggest openclaw devices rotate --role node when local cached node device auth exists but the gateway pairing record no longer approves the node role for that device.
  • Real environment tested: Linux OpenClaw gateway/container checkout running this PR branch's real CLI (node scripts/run-node.mjs doctor) against copied real local OpenClaw pairing/auth state. The copied state restored stale cached node auth from /tmp/openclaw-device-auth-backup-20260505-142808.json and used the current gateway pairing store for device dfe20bd46232a79ecf28cefa5adc9777f699d8cb4f1e358cab3ac2175704848a, which is approved as operator only.
  • Exact steps or command run after this patch:
proof_dir=/tmp/openclaw-doctor-device-proof-77688
rm -rf "$proof_dir"
mkdir -p "$proof_dir/identity" "$proof_dir/devices"
cp -p "$HOME/.openclaw/identity/device.json" "$proof_dir/identity/device.json"
cp -p /tmp/openclaw-device-auth-backup-20260505-142808.json "$proof_dir/identity/device-auth.json"
cp -p "$HOME/.openclaw/devices/paired.json" "$proof_dir/devices/paired.json"
OPENCLAW_STATE_DIR="$proof_dir" NO_COLOR=1 node scripts/run-node.mjs doctor --no-workspace-suggestions
  • Evidence after fix: copied live console output from the real openclaw doctor run:
◇  Device pairing ────────────────────────────────────────────────────────╮
│                                                                         │
│  - Local cached node device auth for cli                                │
│    (dfe20bd46232a79ecf28cefa5adc9777f699d8cb4f1e358cab3ac2175704848a)   │
│    no longer has a matching active gateway token, and that role is no   │
│    longer approved for this device. Reconnect with shared gateway auth  │
│    to refresh local auth, or remove the stale cached node auth entry.   │

Additional proof checks from that same captured output:

Local cached node device auth: True
remove the stale cached node auth entry: True
--role node: False
device token rotation denied: False
  • Observed result after fix: doctor still detects the stale local cached node auth, but the after-fix output no longer suggests openclaw devices rotate --role node; it instead says the role is no longer approved and points to refreshing/removing stale local auth.
  • What was not tested: full live node re-pairing on a physical node; full repository test suite.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Added regression test:

  • does not suggest rotating local auth for a role that is no longer approved

Local verification:

  • pnpm exec oxfmt --check src/commands/doctor-device-pairing.ts src/commands/doctor-device-pairing.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.commands.config.ts src/commands/doctor-device-pairing.test.ts
  • git diff --check
  • node scripts/check-changelog-attributions.mjs

Real behavior proof from my setup:

I ran this branch's real openclaw doctor command against a copy of my local device-pairing state with the stale cached node auth restored. The gateway pairing record for the same device is operator-only, matching the field report that produced the bad advice.

Command shape:

OPENCLAW_STATE_DIR=/tmp/openclaw-doctor-device-proof-77688 \
  NO_COLOR=1 \
  node scripts/run-node.mjs doctor --no-workspace-suggestions

Relevant after-fix doctor output:

◇  Device pairing ────────────────────────────────────────────────────────╮
│                                                                         │
│  - Local cached node device auth for cli                                │
│    (dfe20bd46232a79ecf28cefa5adc9777f699d8cb4f1e358cab3ac2175704848a)   │
│    no longer has a matching active gateway token, and that role is no   │
│    longer approved for this device. Reconnect with shared gateway auth  │
│    to refresh local auth, or remove the stale cached node auth entry.   │

Proof checks from that run:

Local cached node device auth: True
remove the stale cached node auth entry: True
--role node: False
device token rotation denied: False
  • node scripts/check-changelog-attributions.mjs
  • real openclaw doctor run via this branch against a copy of my local device-pairing setup containing the stale cached node role

Human Verification (required)

  • Verified scenarios:
    • focused doctor device-pairing test file passes with the new stale local-role regression test
    • formatting passes for touched files
    • whitespace diff check passes
    • changelog attribution check passes
    • real openclaw doctor output from a copied local pairing setup no longer suggests --role node
  • Edge cases checked:
    • wider pass over doctor-device-pairing.ts confirmed rotate guidance remains valid for approved-role stale token/scope cases
    • unapproved cached role no longer includes --role node rotation advice
  • What you did not verify:
    • full repository test suite
    • live device re-pairing flow on an actual node

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

No review conversations have been addressed yet.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: the revised message tells users to remove a stale cached role entry but does not provide a dedicated CLI command for that exact local-cache cleanup.
    • Mitigation: it also keeps the safer reconnect/shared gateway auth path, and avoids suggesting a command known to fail for unapproved roles.

@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: XS labels May 5, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 5, 2026

Codex review: needs maintainer review before merge.

Summary
The PR changes doctor device-pairing advice for unapproved stale local cached device-auth roles, adds a command-level regression test, and adds a changelog entry.

Reproducibility: yes. Current main's local cached auth branch can print rotate advice for an unapproved role with no paired token, while the gateway rotation path rejects roles outside the approved pairing contract.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix live CLI output from the real doctor command against copied stale pairing state showing the improved advice.

Next step before merge
No repair lane is needed; the remaining action is ordinary maintainer and CI merge handling for an otherwise focused contributor PR.

Security
Cleared: The diff changes doctor guidance text, a focused test, and changelog text without broadening token authority, secrets handling, network calls, dependencies, or command execution surface.

Review details

Best possible solution:

Merge the narrow doctor messaging, regression test, and changelog change after required checks pass while preserving the existing rotation authorization contract.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main's local cached auth branch can print rotate advice for an unapproved role with no paired token, while the gateway rotation path rejects roles outside the approved pairing contract.

Is this the best way to solve the issue?

Yes. The PR changes only the unapproved no-token doctor guidance branch and keeps rotate advice for approved stale-token and stale-scope cases, which is the narrow maintainable fix.

What I checked:

  • current-main bad advice path: Current main builds a rotate command before checking whether the cached local role is still approved, then prints that rotate command when no matching gateway token exists for an unapproved cached role. (src/commands/doctor-device-pairing.ts:477, 5a8ccb6fe0ef)
  • rotation contract in docs: The devices CLI docs state that token rotation requires the target role to already exist in the approved pairing contract and cannot mint an unapproved role. Public docs: docs/cli/devices.md. (docs/cli/devices.md:91, 5a8ccb6fe0ef)
  • server-side enforcement: The token rotation implementation returns failure before updating state when the requested role is not in listApprovedPairedDeviceRoles(device), matching the PR's claimed denied rotate path. (src/infra/device-pairing.ts:986, 5a8ccb6fe0ef)
  • PR fix boundary: The PR replaces only the unapproved no-token local-cache doctor guidance with role-no-longer-approved reconnect/remove-stale-cache advice, while leaving rotate guidance for stale approved token and stale scope cases. (src/commands/doctor-device-pairing.ts:492, bd05539d0019)
  • regression coverage: The PR adds a test for cached node auth on an operator-only paired device and asserts the warning appears without a --role node rotation suggestion. (src/commands/doctor-device-pairing.test.ts:170, bd05539d0019)
  • real behavior proof: The PR body includes after-fix output from the real doctor command against copied local pairing state with stale node auth, showing the stale warning, no --role node suggestion, and no device token rotation denied text. (bd05539d0019)

Likely related people:

  • steipete: Local blame in this checkout for the doctor advice branch, rotation enforcement, and devices CLI docs points to recent current-main work, and earlier device-token auth/CLI commits introduced and refactored the rotation contract this doctor advice depends on. (role: current code-history owner and adjacent device-token owner; confidence: high; commits: 0d3b74e45a59, d88b239d3c8a, c92bcf24c446; files: src/commands/doctor-device-pairing.ts, src/commands/doctor-device-pairing.test.ts, src/infra/device-pairing.ts)
  • Jacob Tomlinson: Recent pairing approval history includes caller-scope forwarding and node approval restrictions adjacent to the approved-role boundary enforced by token rotation and surfaced by doctor. (role: adjacent pairing approval maintainer; confidence: medium; commits: 4ee4960de233, 4d7cc6bb4fac; files: src/infra/device-pairing.ts)

Remaining risk / open question:

  • This read-only review did not run the contributor's targeted checks or the full changed gate; required CI should still gate merge.
  • The new advice mentions removing the stale cached role entry without adding a dedicated cleanup command, though it also keeps the safer reconnect path.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a8ccb6fe0ef.

@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026
@Conan-Scott
Copy link
Copy Markdown
Contributor Author

Added both requested pieces:

  • changelog entry in CHANGELOG.md (Unreleased / Fixes)
  • real after-fix openclaw doctor output in the PR Evidence section, produced by running this branch against a copy of my local device-pairing state with the stale cached node role restored

The proof run confirms the doctor warning still appears for the stale local cache, but no longer suggests --role node rotation and does not hit device token rotation denied.

@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026
@obviyus obviyus self-assigned this May 5, 2026
@obviyus obviyus force-pushed the fix/doctor-stale-local-device-auth-advice branch from bd05539 to 809092f Compare May 5, 2026 06:53
@obviyus obviyus force-pushed the fix/doctor-stale-local-device-auth-advice branch from 809092f to 606408b Compare May 5, 2026 06:56
Copy link
Copy Markdown
Contributor

@obviyus obviyus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the doctor device-pairing path no longer suggests openclaw devices rotate --role <role> when the cached local role is no longer approved by the gateway pairing record.

Maintainer follow-up: rebased onto latest main and moved the changelog note to the end of the active Fixes block with the PR reference.

Local gate: pnpm test src/commands/doctor-device-pairing.test.ts, plus targeted format/changelog/diff checks.

@obviyus obviyus merged commit 11d2bb1 into openclaw:main May 5, 2026
94 checks passed
@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented May 5, 2026

Landed on main.

Thanks @Conan-Scott.

vincentkoc added a commit to VintageAyu/openclaw that referenced this pull request May 5, 2026
…ainer-hardening

* origin/main: (843 commits)
  docs(changelog): relocate openclaw#77046 and openclaw#77280 entries from 2026.5.3 to Unreleased (openclaw#77728)
  docs: reorder unreleased changelog
  fix: expose ollama thinking profile before activation (openclaw#77617) (thanks @yfge)
  fix: expose ollama thinking profile before activation
  test(gateway): preserve dispatch timers in waiter
  test(gateway): keep startup context timer live
  docs: document cache-friendly activity helper
  ci: install ffmpeg for Mantis media previews
  fix: avoid impossible device token rotation advice (openclaw#77688) (thanks @Conan-Scott)
  docs(changelog): note doctor device pairing advice fix
  fix(doctor): avoid impossible device token rotation advice
  ci: use Crabbox media previews for Mantis
  docs: filter maintainer-owned triage noise
  test: cover GitHub activity helper
  fix(session-file-repair): drop null-role message entries instead of preserving them (openclaw#77288)
  fix: prune orphan session artifacts
  perf: reduce GitHub activity cache misses
  fix: cache session list model resolution (openclaw#77650) (thanks @ragesaq)
  ci: embed Mantis desktop previews
  fix(replay-history): drop trailing stream-error placeholder before provider send (openclaw#77287)
  ...

# Conflicts:
#	CHANGELOG.md
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants