Skip to content

Use daemon auth for unmanaged restart probes#59439

Closed
roytong9 wants to merge 1 commit into
openclaw:mainfrom
roytong9:fix/auth-aware-daemon-restart-probes
Closed

Use daemon auth for unmanaged restart probes#59439
roytong9 wants to merge 1 commit into
openclaw:mainfrom
roytong9:fix/auth-aware-daemon-restart-probes

Conversation

@roytong9

@roytong9 roytong9 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Resubmitting after reducing my active PR queue below the repo limit. This is the same focused fix, with the existing lifecycle test passing locally.

Summary

Use daemon/service-command auth when probing unmanaged gateway restarts instead of relying only on the current shell environment.

What changed

  • Updated unmanaged restart probe auth resolution in src/cli/daemon-cli/lifecycle.ts
  • Read the daemon service command environment before probing the local gateway
  • Merged service-command environment over process.env for probe auth resolution
  • Resolved probe auth via resolveGatewayProbeAuthSafeWithSecretInputs(...) when config is available
  • Kept a safe fallback to empty auth when config cannot be loaded
  • Added a focused regression test covering the case where shell env is unset but the service command environment contains gateway auth
  • Updated the test’s config mock to preserve the real module exports while overriding only the needed functions

Why

The unmanaged restart probe path was previously reading only:

  • OPENCLAW_GATEWAY_TOKEN
  • OPENCLAW_GATEWAY_PASSWORD

from the current shell environment.

That could make restart health checks miss valid daemon auth that was present in the service command environment, even though other daemon-status paths already reason about daemon/service environment separately.

This change makes unmanaged restart probing use the daemon-side auth view instead of only the caller shell environment.

Scope boundary

This PR does not change the broader daemon status flow or the probe auth resolver itself. It only updates the unmanaged restart probe path in lifecycle.ts to use the daemon/service-command environment when resolving auth.

Manual verification

  1. Added a focused test in src/cli/daemon-cli/lifecycle.test.ts
  2. Verified that unmanaged restart probes now use service-command auth when shell env is unset
  3. Ran:
    • pnpm vitest run src/cli/daemon-cli/lifecycle.test.ts
  4. Ran the repo checks during commit

@openclaw-barnacle openclaw-barnacle Bot added cli CLI command changes size: S labels Apr 2, 2026
@greptile-apps

greptile-apps Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

The production change in lifecycle.ts is correct: it merges the daemon service-command environment over process.env and delegates probe-auth resolution to resolveGatewayProbeAuthSafeWithSecretInputs, with a safe { auth: {} } fallback when no config is available.

  • The new regression test in lifecycle.test.ts is missing a loadConfig setup. loadConfig.mockReset() in beforeEach leaves it returning nothing, so readBestEffortConfig() resolves to a falsy value, cfg is falsy, and the resolveGatewayProbeAuthSafeWithSecretInputs path is never entered. The assertion on auth.token will fail because probeGateway receives auth: {} instead of the expected service-command auth.

Confidence Score: 4/5

  • Safe to merge once the test setup is fixed; the production code change is correct.
  • One P1 finding: the new regression test is broken because loadConfig is not set up to return a config object, meaning the service-command auth code path is never exercised and the assertion would fail. The underlying production logic in lifecycle.ts is sound.
  • src/cli/daemon-cli/lifecycle.test.ts — the new test case at line 286 needs a loadConfig implementation configured before the act.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle.test.ts
Line: 286-310

Comment:
**Test never reaches the new code path — missing `loadConfig` return value setup**

`loadConfig.mockReset()` runs in `beforeEach` (line 153) without a subsequent mock implementation, so `loadConfig()` returns nothing for this test. `readBestEffortConfig()` resolves to a falsy value, `cfg` is falsy in `assertUnmanagedGatewayRestartEnabled`, and the function takes the `{ auth: {} }` fallback — `resolveGatewayProbeAuthSafeWithSecretInputs` is never called. The assertion checking `auth.token` from the service command env will fail because `probeGateway` actually receives `auth: {}`.

The fix is to configure `loadConfig` to return an object in this test (e.g. via `loadConfig.mockImplementation(() => ({}))`) before calling `runDaemonRestart`, so that `cfg` is truthy and the merged-env auth resolution path is actually exercised.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Use daemon auth for unmanaged restart pr..." | Re-trigger Greptile

Comment on lines +286 to +310
it("uses service command auth for unmanaged restart probes when shell env is unset", async () => {
findVerifiedGatewayListenerPidsOnPortSync.mockReturnValue([4200]);
mockUnmanagedRestart({ runPostRestartCheck: true });

delete process.env.OPENCLAW_GATEWAY_TOKEN;
delete process.env.OPENCLAW_GATEWAY_PASSWORD;

service.readCommand.mockResolvedValue({
programArguments: ["openclaw", "gateway", "--port", "18789"],
environment: {
OPENCLAW_GATEWAY_TOKEN: " service-token ",
},
});

await runDaemonRestart({ json: true });

expect(probeGateway).toHaveBeenCalledWith(
expect.objectContaining({
auth: {
token: "service-token",
password: undefined,
},
}),
);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test never reaches the new code path — missing loadConfig return value setup

loadConfig.mockReset() runs in beforeEach (line 153) without a subsequent mock implementation, so loadConfig() returns nothing for this test. readBestEffortConfig() resolves to a falsy value, cfg is falsy in assertUnmanagedGatewayRestartEnabled, and the function takes the { auth: {} } fallback — resolveGatewayProbeAuthSafeWithSecretInputs is never called. The assertion checking auth.token from the service command env will fail because probeGateway actually receives auth: {}.

The fix is to configure loadConfig to return an object in this test (e.g. via loadConfig.mockImplementation(() => ({}))) before calling runDaemonRestart, so that cfg is truthy and the merged-env auth resolution path is actually exercised.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle.test.ts
Line: 286-310

Comment:
**Test never reaches the new code path — missing `loadConfig` return value setup**

`loadConfig.mockReset()` runs in `beforeEach` (line 153) without a subsequent mock implementation, so `loadConfig()` returns nothing for this test. `readBestEffortConfig()` resolves to a falsy value, `cfg` is falsy in `assertUnmanagedGatewayRestartEnabled`, and the function takes the `{ auth: {} }` fallback — `resolveGatewayProbeAuthSafeWithSecretInputs` is never called. The assertion checking `auth.token` from the service command env will fail because `probeGateway` actually receives `auth: {}`.

The fix is to configure `loadConfig` to return an object in this test (e.g. via `loadConfig.mockImplementation(() => ({}))`) before calling `runDaemonRestart`, so that `cfg` is truthy and the merged-env auth resolution path is actually exercised.

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1dfb403370

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const tlsEnabled = !!cfg?.gateway?.tls?.enabled;
const scheme = tlsEnabled ? "wss" : "ws";
const probeAuth = cfg
? await resolveGatewayProbeAuthSafeWithSecretInputs({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Catch probe-auth resolution failures in unmanaged restart check

This awaited auth-resolution call can throw non-GatewaySecretRefUnavailableError exceptions (for example SecretProviderResolutionError from a failing SecretRef backend), and the exception currently bubbles out of assertUnmanagedGatewayRestartEnabled. In that scenario runDaemonRestart exits before signaling the unmanaged gateway PID, even though this preflight is otherwise best-effort (probeGateway failures are swallowed). This turns transient secret-provider issues into hard restart failures for unmanaged deployments.

Useful? React with 👍 / 👎.

@roytong9

roytong9 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

The current CI / check failure appears unrelated to this change and is coming from extensions/telegram/src/monitor.ts (Cannot find name 'TelegramPollingSession' / TelegramExecApprovalHandler). Happy to wait for a rerun once that unrelated TS issue is cleared.

@vincentkoc

Copy link
Copy Markdown
Member

This is superseded by #72405, which has landed as the canonical ProjectClownfish fix path for this cluster.

Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out.

@vincentkoc vincentkoc closed this Apr 28, 2026
@clawsweeper clawsweeper Bot mentioned this pull request May 2, 2026
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper Tracked by ClawSweeper automation cli CLI command changes size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants