Skip to content

fix: pass cloud provider env vars through to gateway service#62673

Closed
JiaDe-Wu wants to merge 4 commits into
openclaw:mainfrom
JiaDe-Wu:fix/aws-service-env-passthrough
Closed

fix: pass cloud provider env vars through to gateway service#62673
JiaDe-Wu wants to merge 4 commits into
openclaw:mainfrom
JiaDe-Wu:fix/aws-service-env-passthrough

Conversation

@JiaDe-Wu

@JiaDe-Wu JiaDe-Wu commented Apr 7, 2026

Copy link
Copy Markdown

PR: fix: pass cloud provider env vars through to gateway service

Target: openclaw/openclaw main branch
Branch: fix/aws-service-env-passthrough
Fixes: #61847


Summary

buildServiceEnvironment curates a whitelist of env vars for the systemd/launchd service, but excludes all cloud provider variables (AWS_PROFILE, AWS_REGION, AZURE_OPENAI_*, GOOGLE_CLOUD_*). This breaks every headless cloud deployment where credentials come from instance metadata (IMDS) or instance profiles — the most common setup on AWS, Azure, and GCP.

Impact

This is a blocking regression for all AWS Bedrock users who upgrade past 2026.3.24.

We maintain sample-OpenClaw-on-AWS-with-Bedrock, an AWS-published CloudFormation template that deploys OpenClaw on EC2 with Amazon Bedrock. After the switch to pi-coding-agent in 2026.4.5, every new deployment and every upgrade fails with:

No API key found for amazon-bedrock.
Use /login or set an API key environment variable.

Real consequences:

  • Issue #64: User reported deploy failure — the first external report
  • Colleague deployments blocked: Internal team members deploying via our CloudFormation template hit this on first boot, cascading into offline agents, React crashes, and broken admin consoles
  • Upgrade path broken: Users on 2026.3.24 (which used "auth": "aws-sdk" in config) cannot upgrade — we had to pin the version and warn users not to select latest
  • Weeks of debugging: We initially assumed it was a config format issue and iterated through plugins.entries.amazon-bedrock.config.auth, models.providers.amazon-bedrock.auth, and plugin enabled: true — none worked because the root cause is the service environment, not the config file

Root Cause

On EC2 with an IAM instance role:

  1. User's shell has AWS_PROFILE=default → AWS SDK discovers credentials via IMDS ✅
  2. openclaw gateway install writes systemd service with curated env → strips AWS_PROFILE
  3. Gateway service starts → pi-coding-agent checks env vars for cloud credentials → finds nothing → "No API key found"
  4. openclaw gateway install --force (upgrades, doctor --fix) overwrites any manual Environment= additions

The ~/.openclaw/.env workaround works (EnvironmentFile is preserved across reinstalls), but users must discover it themselves — there's no error message pointing to the solution.

Solution

Add SERVICE_CLOUD_PROVIDER_ENV_KEYS — same pattern as the existing SERVICE_PROXY_ENV_KEYS — to pass through cloud provider env vars when present in the host environment:

  • AWS: AWS_PROFILE, AWS_REGION, AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_BEARER_TOKEN_BEDROCK
  • Azure: AZURE_OPENAI_API_KEY, AZURE_OPENAI_BASE_URL, AZURE_OPENAI_RESOURCE_NAME, AZURE_OPENAI_API_VERSION, AZURE_OPENAI_DEPLOYMENT_NAME_MAP
  • Google: GOOGLE_CLOUD_PROJECT, GOOGLE_APPLICATION_CREDENTIALS

The implementation reuses the existing env reader pattern (refactored into shared readServiceKeysFromEnv helper) and adds one spread in buildCommonServiceEnvironment. Zero behavioral change for users who don't set these vars.

Changes

File: src/daemon/service-env.ts (+50, -2)

Change Location Description
New constant Line ~65 SERVICE_CLOUD_PROVIDER_ENV_KEYS — AWS/Azure/Google env var whitelist
Refactor Line ~95 Extract readServiceKeysFromEnv shared helper from readServiceProxyEnvironment
Type update Line ~43 Add cloudProviderEnv to SharedServiceEnvironmentFields
Wire-up Line ~368 Spread cloudProviderEnv in buildCommonServiceEnvironment

Testing

Scenario Before After
EC2 + IAM role + AWS_PROFILE=default in shell ❌ "No API key found" ✅ Bedrock works
EC2 + ~/.openclaw/.env workaround ✅ works ✅ still works (EnvironmentFile precedence)
Local dev (no cloud vars set) ✅ no change ✅ no change
gateway install --force ❌ strips AWS vars ✅ preserves them

Verified on: OpenClaw 2026.4.5, EC2 t4g.small (Graviton ARM64), Ubuntu 24.04, Amazon Bedrock Nova 2 Lite.

Operation Steps

# Step 1: Clone fork
cd /tmp
git clone https://github.com/JiaDe-Wu/openclaw.git
cd openclaw

# Step 2: Create branch
git checkout -b fix/aws-service-env-passthrough

# Step 3: Replace src/daemon/service-env.ts with the modified version

# Step 4: Commit and push
git add src/daemon/service-env.ts
git commit -m "fix: pass cloud provider env vars through to gateway service (#61847)"
git push origin fix/aws-service-env-passthrough

# Step 5: Create PR
# Open: https://github.com/openclaw/openclaw/compare/main...JiaDe-Wu:openclaw:fix/aws-service-env-passthrough

fix: pass cloud provider env vars through to gateway service (openclaw#61847)
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S labels Apr 7, 2026
@greptile-apps

greptile-apps Bot commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Clean, targeted fix that follows the existing proxy-env passthrough pattern exactly — adds SERVICE_CLOUD_PROVIDER_ENV_KEYS, extracts a shared readServiceKeysFromEnv helper, and wires the new field through buildCommonServiceEnvironment. No behavioral change for users without cloud env vars set.

Confidence Score: 5/5

Safe to merge; both findings are P2 suggestions that do not block correctness.

The implementation is correct, the refactor is clean, and all remaining findings are non-blocking style/test suggestions.

No files require special attention.

Comments Outside Diff (1)

  1. src/daemon/service-env.test.ts, line 328-346 (link)

    P2 No test for cloud provider env passthrough

    The proxy passthrough has an explicit "forwards proxy environment variables for launchd/systemd runtime" test but the new cloudProviderEnv spread has no counterpart. Following the same pattern here would lock in the behavior and catch regressions.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/daemon/service-env.test.ts
    Line: 328-346
    
    Comment:
    **No test for cloud provider env passthrough**
    
    The proxy passthrough has an explicit `"forwards proxy environment variables for launchd/systemd runtime"` test but the new `cloudProviderEnv` spread has no counterpart. Following the same pattern here would lock in the behavior and catch regressions.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/daemon/service-env.ts
Line: 70-88

Comment:
**Missing AWS IRSA env vars for EKS deployments**

`AWS_WEB_IDENTITY_TOKEN_FILE`, `AWS_ROLE_ARN`, and `AWS_ROLE_SESSION_NAME` are injected by the EKS pod-identity webhook and are the standard mechanism for IRSA (IAM Roles for Service Accounts). Without them, Kubernetes-based deployments will still fail with "No API key found" even after this fix.

```suggestion
const SERVICE_CLOUD_PROVIDER_ENV_KEYS = [
  // AWS / Amazon Bedrock
  "AWS_PROFILE",
  "AWS_REGION",
  "AWS_DEFAULT_REGION",
  "AWS_ACCESS_KEY_ID",
  "AWS_SECRET_ACCESS_KEY",
  "AWS_SESSION_TOKEN",
  "AWS_BEARER_TOKEN_BEDROCK",
  // AWS IRSA (EKS IAM Roles for Service Accounts)
  "AWS_WEB_IDENTITY_TOKEN_FILE",
  "AWS_ROLE_ARN",
  "AWS_ROLE_SESSION_NAME",
  // Azure OpenAI
  "AZURE_OPENAI_API_KEY",
  "AZURE_OPENAI_BASE_URL",
  "AZURE_OPENAI_RESOURCE_NAME",
  "AZURE_OPENAI_API_VERSION",
  "AZURE_OPENAI_DEPLOYMENT_NAME_MAP",
  // Google Cloud (Vertex AI / Gemini)
  "GOOGLE_CLOUD_PROJECT",
  "GOOGLE_APPLICATION_CREDENTIALS",
] as const;
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/daemon/service-env.test.ts
Line: 328-346

Comment:
**No test for cloud provider env passthrough**

The proxy passthrough has an explicit `"forwards proxy environment variables for launchd/systemd runtime"` test but the new `cloudProviderEnv` spread has no counterpart. Following the same pattern here would lock in the behavior and catch regressions.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Add cloud provider environment handling" | Re-trigger Greptile

Comment thread src/daemon/service-env.ts
Comment on lines +70 to +88
const SERVICE_CLOUD_PROVIDER_ENV_KEYS = [
// AWS / Amazon Bedrock
"AWS_PROFILE",
"AWS_REGION",
"AWS_DEFAULT_REGION",
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_BEARER_TOKEN_BEDROCK",
// Azure OpenAI
"AZURE_OPENAI_API_KEY",
"AZURE_OPENAI_BASE_URL",
"AZURE_OPENAI_RESOURCE_NAME",
"AZURE_OPENAI_API_VERSION",
"AZURE_OPENAI_DEPLOYMENT_NAME_MAP",
// Google Cloud (Vertex AI / Gemini)
"GOOGLE_CLOUD_PROJECT",
"GOOGLE_APPLICATION_CREDENTIALS",
] as const;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing AWS IRSA env vars for EKS deployments

AWS_WEB_IDENTITY_TOKEN_FILE, AWS_ROLE_ARN, and AWS_ROLE_SESSION_NAME are injected by the EKS pod-identity webhook and are the standard mechanism for IRSA (IAM Roles for Service Accounts). Without them, Kubernetes-based deployments will still fail with "No API key found" even after this fix.

Suggested change
const SERVICE_CLOUD_PROVIDER_ENV_KEYS = [
// AWS / Amazon Bedrock
"AWS_PROFILE",
"AWS_REGION",
"AWS_DEFAULT_REGION",
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_BEARER_TOKEN_BEDROCK",
// Azure OpenAI
"AZURE_OPENAI_API_KEY",
"AZURE_OPENAI_BASE_URL",
"AZURE_OPENAI_RESOURCE_NAME",
"AZURE_OPENAI_API_VERSION",
"AZURE_OPENAI_DEPLOYMENT_NAME_MAP",
// Google Cloud (Vertex AI / Gemini)
"GOOGLE_CLOUD_PROJECT",
"GOOGLE_APPLICATION_CREDENTIALS",
] as const;
const SERVICE_CLOUD_PROVIDER_ENV_KEYS = [
// AWS / Amazon Bedrock
"AWS_PROFILE",
"AWS_REGION",
"AWS_DEFAULT_REGION",
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_BEARER_TOKEN_BEDROCK",
// AWS IRSA (EKS IAM Roles for Service Accounts)
"AWS_WEB_IDENTITY_TOKEN_FILE",
"AWS_ROLE_ARN",
"AWS_ROLE_SESSION_NAME",
// Azure OpenAI
"AZURE_OPENAI_API_KEY",
"AZURE_OPENAI_BASE_URL",
"AZURE_OPENAI_RESOURCE_NAME",
"AZURE_OPENAI_API_VERSION",
"AZURE_OPENAI_DEPLOYMENT_NAME_MAP",
// Google Cloud (Vertex AI / Gemini)
"GOOGLE_CLOUD_PROJECT",
"GOOGLE_APPLICATION_CREDENTIALS",
] as const;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/service-env.ts
Line: 70-88

Comment:
**Missing AWS IRSA env vars for EKS deployments**

`AWS_WEB_IDENTITY_TOKEN_FILE`, `AWS_ROLE_ARN`, and `AWS_ROLE_SESSION_NAME` are injected by the EKS pod-identity webhook and are the standard mechanism for IRSA (IAM Roles for Service Accounts). Without them, Kubernetes-based deployments will still fail with "No API key found" even after this fix.

```suggestion
const SERVICE_CLOUD_PROVIDER_ENV_KEYS = [
  // AWS / Amazon Bedrock
  "AWS_PROFILE",
  "AWS_REGION",
  "AWS_DEFAULT_REGION",
  "AWS_ACCESS_KEY_ID",
  "AWS_SECRET_ACCESS_KEY",
  "AWS_SESSION_TOKEN",
  "AWS_BEARER_TOKEN_BEDROCK",
  // AWS IRSA (EKS IAM Roles for Service Accounts)
  "AWS_WEB_IDENTITY_TOKEN_FILE",
  "AWS_ROLE_ARN",
  "AWS_ROLE_SESSION_NAME",
  // Azure OpenAI
  "AZURE_OPENAI_API_KEY",
  "AZURE_OPENAI_BASE_URL",
  "AZURE_OPENAI_RESOURCE_NAME",
  "AZURE_OPENAI_API_VERSION",
  "AZURE_OPENAI_DEPLOYMENT_NAME_MAP",
  // Google Cloud (Vertex AI / Gemini)
  "GOOGLE_CLOUD_PROJECT",
  "GOOGLE_APPLICATION_CREDENTIALS",
] as const;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added AWS_WEB_IDENTITY_TOKEN_FILE, AWS_ROLE_ARN, and AWS_ROLE_SESSION_NAME for EKS IRSA support. Also added test coverage mirroring the proxy env tests.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80c38fa3ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/daemon/service-env.ts
Comment on lines +75 to +77
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude short-lived AWS creds from service env passthrough

Passing AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKEN through the install-time whitelist bakes whatever credentials are in the installer shell into the persisted service environment. In the gateway install flow, serviceEnvironment is merged last (src/commands/daemon-install-helpers.ts:78-90), so these values override durable .env/config values and the normal IMDS/instance-profile fallback; if the installer used temporary STS creds (common with aws-vault/assume-role), the service will start failing once they expire and keep failing until reinstalled.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. A few clarifications:

  • Primary use case is AWS_PROFILE=default on EC2/ECS — the SDK resolves credentials dynamically via IMDS on every request, nothing is "baked in". The service env just tells the SDK where to look.
  • Same design as proxy passthroughHTTP_PROXY is also forwarded from the installer's env and could go stale if the proxy moves. This is accepted behavior.
  • ~/.openclaw/.env remains the recommended approach for durable configuration (EnvironmentFile takes precedence over inline Environment in systemd).
  • If a user runs gateway install inside an aws-vault session with temporary STS creds, yes, those would be frozen — but that's a user-side anti-pattern, not something the installer should guard against. The same risk exists for any env var the installer captures.

I've added a JSDoc warning on the AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN entries to flag the ephemeral credential caveat.

JiaDe-Wu added 3 commits April 8, 2026 13:33
address review: add EKS IRSA vars, JSDoc warning for ephemeral creds
test: add cloud provider env passthrough tests

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 706b48c672

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/daemon/service-env.ts
Comment on lines +80 to +83
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_BEARER_TOKEN_BEDROCK",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop short-lived AWS secrets from service env passthrough

Persisting AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, and AWS_BEARER_TOKEN_BEDROCK in the install-time whitelist can lock the daemon onto expiring shell credentials (for example aws-vault/STS/SSO sessions) instead of durable runtime auth. In this codebase, service env is merged last and overrides durable .env/config entries (src/commands/daemon-install-helpers.ts:78-90), and AWS auth resolution prefers bearer/static key envs before AWS_PROFILE/default chain (src/agents/model-auth.ts:292-313), so once those captured credentials expire the gateway can fail until reinstall/manual unit edits. Fresh evidence versus earlier discussion: this commit also persists AWS_BEARER_TOKEN_BEDROCK, which is checked first in auth selection.

Useful? React with 👍 / 👎.

@JiaDe-Wu

Copy link
Copy Markdown
Author

@steipete @obviyus — Friendly ping on this PR.

This is a blocking regression (#61847) since v2026.4.5 that breaks all cloud deployments relying on instance-level credentials (AWS IMDS/instance profiles, Azure managed identity, GCP service accounts). It now has 4 independent confirmations across EC2, Lightsail (#61847 comment), and Google Vertex (#64283).

The fix is small and conservative — +50/-2 in service-env.ts, reusing the exact same pattern as SERVICE_PROXY_ENV_KEYS. Zero behavioral change for users without cloud env vars set. Both Greptile (5/5) and Codex reviewed it; all feedback has been addressed (EKS IRSA vars added, JSDoc warnings for ephemeral creds, test coverage added).

@steipete I see you committed to this same file just today (d4e93e7 — persist private ws opt-in). Happy to rebase onto your latest changes if that helps move this forward.

We maintain sample-OpenClaw-on-AWS-with-Bedrock (370+ stars) and have had to pin users to v2026.3.24 because of this. Would really appreciate a review when you get a chance. 🙏

@JiaDe-Wu

Copy link
Copy Markdown
Author

Closing this — the underlying issue has been resolved in v2026.4.10 via 8b76392e3e79 with a more comprehensive approach (generic safe custom env preservation + Bedrock IMDS default-chain auth fix).

Thanks @steipete for picking this up. Will verify against our CloudFormation template and unpin from v2026.3.24.

For anyone landing here from sample-OpenClaw-on-AWS-with-Bedrock: upgrade to v2026.4.10+ and the ~/.openclaw/.env workaround is no longer needed.

@JiaDe-Wu JiaDe-Wu closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: gateway install --force breaks AWS credential discovery on EC2 instances with instance roles

1 participant