feat(secrets): add env secret provider#43640
Conversation
Greptile SummaryThis PR introduces the Key issues found:
Confidence Score: 2/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/config/secret-resolution.ts
Line: 424-426
Comment:
**`String.replace()` corrupts secret values containing `$` patterns**
`working.replace(full, sv)` treats the second argument as a replacement string with special `$`-patterns. If the resolved secret value contains `$$`, `$&`, `` $` ``, `$'`, or `$n`, JavaScript will interpret them as replacement patterns rather than literal characters — silently corrupting the injected secret.
For example, a database password `pa$$w0rd` would become `pa$w0rd`, and a value like `$&extra` would expand to the matched pattern text prepended to `extra`.
Use a replacer function to prevent this:
```suggestion
for (const { full, value: sv } of resolved) {
working = working.replace(full, () => sv);
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/config/secret-resolution.ts
Line: 476-480
Comment:
**Cache TTL only reads from GCP provider config**
The cache TTL is always sourced exclusively from `secretsConfig.providers.gcp.cacheTtlSeconds`, even though `cacheTtlSeconds` is defined as a valid field on every provider in `SecretsConfig`. For the `env` provider, this means the 5-minute default TTL is always applied regardless of any per-provider `cacheTtlSeconds` configuration. While env var caching is generally harmless, this will silently ignore any `cacheTtlSeconds` setting on non-GCP providers and will become a footgun when future providers (AWS, Vault, etc.) are added.
Consider computing a per-resolution TTL from the set of referenced providers, or moving the TTL to a top-level `secrets.cacheTtlSeconds` field rather than nesting it inside a specific provider's config.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/config/secret-resolution.ts
Line: 83
Comment:
**Module-level cache is written by both `GcpSecretProvider` and `fetchWithCache`**
`GcpSecretProvider.getSecret()` writes to `secretCache` directly (line 201), and `fetchWithCache` also writes to `secretCache` after calling the provider (line 357). For GCP secrets, the cache key is identical in both places (`gcp:${name}#${ver}`), so the entry is written twice per miss — first inside the provider, then overwritten by `fetchWithCache`. The stale-while-revalidate fallback in `fetchWithCache` (lines 360–364) works correctly because the expired entry survives in the map until it is overwritten, but the double-write is architecturally confusing and makes the two caching paths hard to reason about.
Consider having `GcpSecretProvider.getSecret()` not write to `secretCache` directly and instead rely entirely on the `fetchWithCache` layer, or alternatively remove the `fetchWithCache` layer and push the stale-while-revalidate logic into each provider.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: edc372a |
| for (const { full, value: sv } of resolved) { | ||
| working = working.replace(full, sv); | ||
| } |
There was a problem hiding this comment.
String.replace() corrupts secret values containing $ patterns
working.replace(full, sv) treats the second argument as a replacement string with special $-patterns. If the resolved secret value contains $$, $&, $`, $', or $n, JavaScript will interpret them as replacement patterns rather than literal characters — silently corrupting the injected secret.
For example, a database password pa$$w0rd would become pa$w0rd, and a value like $&extra would expand to the matched pattern text prepended to extra.
Use a replacer function to prevent this:
| for (const { full, value: sv } of resolved) { | |
| working = working.replace(full, sv); | |
| } | |
| for (const { full, value: sv } of resolved) { | |
| working = working.replace(full, () => sv); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/secret-resolution.ts
Line: 424-426
Comment:
**`String.replace()` corrupts secret values containing `$` patterns**
`working.replace(full, sv)` treats the second argument as a replacement string with special `$`-patterns. If the resolved secret value contains `$$`, `$&`, `` $` ``, `$'`, or `$n`, JavaScript will interpret them as replacement patterns rather than literal characters — silently corrupting the injected secret.
For example, a database password `pa$$w0rd` would become `pa$w0rd`, and a value like `$&extra` would expand to the matched pattern text prepended to `extra`.
Use a replacer function to prevent this:
```suggestion
for (const { full, value: sv } of resolved) {
working = working.replace(full, () => sv);
}
```
How can I resolve this? If you propose a fix, please make it concise.| const cacheTtlMs = | ||
| secretsConfig?.providers?.gcp?.cacheTtlSeconds != null | ||
| ? secretsConfig.providers.gcp.cacheTtlSeconds * 1000 | ||
| : defaultTtlMs; | ||
|
|
There was a problem hiding this comment.
Cache TTL only reads from GCP provider config
The cache TTL is always sourced exclusively from secretsConfig.providers.gcp.cacheTtlSeconds, even though cacheTtlSeconds is defined as a valid field on every provider in SecretsConfig. For the env provider, this means the 5-minute default TTL is always applied regardless of any per-provider cacheTtlSeconds configuration. While env var caching is generally harmless, this will silently ignore any cacheTtlSeconds setting on non-GCP providers and will become a footgun when future providers (AWS, Vault, etc.) are added.
Consider computing a per-resolution TTL from the set of referenced providers, or moving the TTL to a top-level secrets.cacheTtlSeconds field rather than nesting it inside a specific provider's config.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/secret-resolution.ts
Line: 476-480
Comment:
**Cache TTL only reads from GCP provider config**
The cache TTL is always sourced exclusively from `secretsConfig.providers.gcp.cacheTtlSeconds`, even though `cacheTtlSeconds` is defined as a valid field on every provider in `SecretsConfig`. For the `env` provider, this means the 5-minute default TTL is always applied regardless of any per-provider `cacheTtlSeconds` configuration. While env var caching is generally harmless, this will silently ignore any `cacheTtlSeconds` setting on non-GCP providers and will become a footgun when future providers (AWS, Vault, etc.) are added.
Consider computing a per-resolution TTL from the set of referenced providers, or moving the TTL to a top-level `secrets.cacheTtlSeconds` field rather than nesting it inside a specific provider's config.
How can I resolve this? If you propose a fix, please make it concise.| // --------------------------------------------------------------------------- | ||
|
|
||
| type CacheEntry = { value: string; expiresAt: number }; | ||
| const secretCache = new Map<string, CacheEntry>(); |
There was a problem hiding this comment.
Module-level cache is written by both GcpSecretProvider and fetchWithCache
GcpSecretProvider.getSecret() writes to secretCache directly (line 201), and fetchWithCache also writes to secretCache after calling the provider (line 357). For GCP secrets, the cache key is identical in both places (gcp:${name}#${ver}), so the entry is written twice per miss — first inside the provider, then overwritten by fetchWithCache. The stale-while-revalidate fallback in fetchWithCache (lines 360–364) works correctly because the expired entry survives in the map until it is overwritten, but the double-write is architecturally confusing and makes the two caching paths hard to reason about.
Consider having GcpSecretProvider.getSecret() not write to secretCache directly and instead rely entirely on the fetchWithCache layer, or alternatively remove the fetchWithCache layer and push the stale-while-revalidate logic into each provider.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/secret-resolution.ts
Line: 83
Comment:
**Module-level cache is written by both `GcpSecretProvider` and `fetchWithCache`**
`GcpSecretProvider.getSecret()` writes to `secretCache` directly (line 201), and `fetchWithCache` also writes to `secretCache` after calling the provider (line 357). For GCP secrets, the cache key is identical in both places (`gcp:${name}#${ver}`), so the entry is written twice per miss — first inside the provider, then overwritten by `fetchWithCache`. The stale-while-revalidate fallback in `fetchWithCache` (lines 360–364) works correctly because the expired entry survives in the map until it is overwritten, but the double-write is architecturally confusing and makes the two caching paths hard to reason about.
Consider having `GcpSecretProvider.getSecret()` not write to `secretCache` directly and instead rely entirely on the `fetchWithCache` layer, or alternatively remove the `fetchWithCache` layer and push the stale-while-revalidate logic into each provider.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edc372a693
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const cacheKey = `${provider.name}:${name}#${version ?? "latest"}`; | ||
| const cached = secretCache.get(cacheKey); |
There was a problem hiding this comment.
Scope secret cache keys to provider instance
fetchWithCache keys entries only by provider name, secret name, and version, so two separate resolutions that use the same provider name and secret key but different backing config (for example different GCP projects or different injected env maps) will reuse the first value and skip the real provider lookup. This can return the wrong secret across configs until TTL expiry, which is a data-isolation and correctness issue; include provider-specific identity in the cache key (or keep cache per provider instance).
Useful? React with 👍 / 👎.
|
|
||
| # OpenAI | ||
| if [[ -f "$HOME/.config/openai/credentials.env" ]]; then | ||
| val=$(grep '^OPENAI_API_KEY=' "$HOME/.config/openai/credentials.env" 2>/dev/null | cut -d= -f2-) |
There was a problem hiding this comment.
Prevent optional grep lookups from aborting migration
This script enables set -euo pipefail, but these grep | cut command substitutions are used for optional key discovery; when a file exists but the key is missing, grep exits with status 1 and the whole migration exits immediately. In real environments with partial credential files, scanning stops early and no migration completes, so these lookups need a non-failing fallback (for example || true).
Useful? React with 👍 / 👎.
bf24cd3 to
78fbedb
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78fbedb54f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const cacheKey = `gcp:${secretName}#${ver}`; | ||
| const cached = secretCache.get(cacheKey); |
There was a problem hiding this comment.
Include project identity in GCP cache key
This cache key is shared across all GcpSecretProvider instances but only includes secret name/version, so two providers pointed at different GCP projects (or credentials files) can collide and return each other’s cached value without contacting Secret Manager. In a multi-agent or multi-project process this can leak the wrong secret until TTL expiry; scope the key by provider instance identity (for example project + credentialsFile) or use per-instance cache storage.
Useful? React with 👍 / 👎.
| [[ -v SECRETS_TO_MIGRATE["openclaw-main-github-pat"] ]] && \ | ||
| purge_file "$HOME/.git-credentials" "openclaw-main-github-pat" | ||
|
|
||
| if [[ -v SECRETS_TO_MIGRATE["openclaw-chai-alpaca-key-id"] ]]; then |
There was a problem hiding this comment.
Purge Alpaca file when either migrated key exists
The Alpaca plaintext purge runs only when openclaw-chai-alpaca-key-id was discovered, but migration stores openclaw-chai-alpaca-secret-key independently. If a host only has ALPACA_SECRET_KEY set, migration reports success yet ~/.config/alpaca/sandbox.env is never rewritten, leaving the plaintext secret on disk. The purge condition should cover either migrated Alpaca key (or redact whichever key was migrated).
Useful? React with 👍 / 👎.
78fbedb to
3fecad7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fecad7b25
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| [[ -v SECRETS_TO_MIGRATE["openclaw-main-github-pat"] ]] && \ | ||
| purge_file "$HOME/.git-credentials" "openclaw-main-github-pat" |
There was a problem hiding this comment.
Avoid purging unrelated entries from .git-credentials
This purge rewrites ~/.git-credentials as soon as one GitHub PAT is migrated, but the script only extracts a single GitHub token (head -1) and does not account for other credentials that may be stored in the same file. On hosts that keep multiple GitHub accounts or non-GitHub entries there, running migration will silently delete still-needed credentials and break git auth for those remotes.
Useful? React with 👍 / 👎.
| ); | ||
|
|
||
| for (const { full, value: sv } of resolved) { | ||
| working = working.replace(full, sv); |
There was a problem hiding this comment.
Preserve inserted secret values during token replacement
Replacing refs by repeatedly calling working.replace(full, value) can mutate previously inserted secret data instead of the original placeholders. If one resolved secret contains text like another placeholder (for example secret A includes ${env:B} while the config also has ${env:B}), the later pass can replace inside A's injected value and leave the real B placeholder unresolved, producing incorrect config output.
Useful? React with 👍 / 👎.
| echo " 1. Store secrets: ./scripts/migrate-secrets.sh $PROJECT" | ||
| echo " 2. Set IAM bindings: ./scripts/set-iam-bindings.sh $PROJECT $AGENTS" |
There was a problem hiding this comment.
Print runnable next-step script paths in bootstrap output
The bootstrap summary tells operators to run ./scripts/migrate-secrets.sh and ./scripts/set-iam-bindings.sh, but this commit places both files under scripts/secrets/. Following the printed instructions results in command-not-found errors, which breaks the guided setup flow immediately after bootstrap completes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da7cfee538
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| curl -s -X POST -H "Authorization: Bearer $TOKEN" \ | ||
| "https://serviceusage.googleapis.com/v1/projects/$PROJECT/services/$API:enable" > /dev/null | ||
| echo "✅ (via REST API)" |
There was a problem hiding this comment.
Treat REST API enable failures as errors
The fallback branch unconditionally prints success after curl -s -X POST ... > /dev/null, but curl exits 0 even on HTTP 4xx/5xx unless --fail is used. In environments where gcloud services enable fails and the VM service account lacks Service Usage permissions, this path reports ✅ (via REST API) even though the API was not enabled, so later bootstrap steps fail with confusing errors.
Useful? React with 👍 / 👎.
| # GitHub PAT | ||
| if [[ -f "$HOME/.git-credentials" ]]; then | ||
| # Only extract PATs from github.com entries (ghp_ or gho_ prefix) | ||
| pat=$(grep 'github\.com' "$HOME/.git-credentials" 2>/dev/null | grep -oP '://[^:]+:\Kghp_[^@]+' | head -1) |
There was a problem hiding this comment.
Capture gho_ tokens when migrating GitHub credentials
The scan logic says it handles GitHub tokens with ghp_ or gho_ prefixes, but the extraction regex only matches ghp_. If .git-credentials contains a gho_ token (a common format from gh auth login), the script silently misses it, leaving that plaintext credential unmigrated.
Useful? React with 👍 / 👎.
3b151dc to
be2e729
Compare
Adds the EnvSecretProvider which resolves secret references from environment
variables. This is useful for local development and CI pipelines that inject
secrets via env vars.
Also includes:
- Secret reference resolution infrastructure (secret-resolution.ts)
- GCP Secret Manager provider (inline in resolution layer)
- Updated config.ts exports
- Updated test setup with GCP mock
- Documentation: docs/concepts/secrets.md, docs/gateway/secrets.md
- Bootstrap scripts: scripts/secrets/
Usage in openclaw.json:
${env:MY_SECRET_VAR}
Split from openclaw#24272 per @vincentkoc's request.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be2e72998d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } catch (err) { | ||
| // Stale-while-revalidate: if we have an expired entry, use it | ||
| if (cached) { | ||
| return cached.value; | ||
| } |
There was a problem hiding this comment.
Avoid serving stale secrets after access is revoked
fetchWithCache falls back to an expired cached value on any provider error, so when a secret is deleted or IAM access is revoked (for example GCP NOT_FOUND/PERMISSION_DENIED), resolution still returns the old secret instead of failing. In long-lived processes this can keep revoked credentials usable indefinitely and defeats emergency rotation/revocation until restart or manual cache clear.
Useful? React with 👍 / 👎.
| const resolved = await Promise.all( | ||
| matches.map(async ({ provider: pName, name, version, full }) => { |
There was a problem hiding this comment.
Deduplicate concurrent secret fetches in one resolution pass
resolveString resolves all references in parallel, but fetchWithCache only writes to cache after awaiting provider.getSecret, so duplicate references encountered in the same pass can trigger multiple simultaneous backend reads before any cache entry exists. This defeats the intended per-pass dedupe and can significantly increase Secret Manager traffic/quota usage when configs reuse the same secret repeatedly.
Useful? React with 👍 / 👎.
be2e729 to
0ab9452
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0ab945260b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| // Test: main reads main secret (should work) | ||
| console.log('main → main-*:', await test('~/.config/gcp/openclaw-main-sa.json', 'openclaw-main-SECRET')); |
There was a problem hiding this comment.
Use absolute key paths in verification snippet
The verification command asks users to replace only PROJECT/SECRET, but the sample passes key files as '~/.config/gcp/...' inside JavaScript strings, and Node does not expand ~ there. If operators run the snippet as printed, SecretManagerServiceClient receives a literal ~ path and fails with ENOENT, so the IAM isolation check reports errors even when bindings are correct.
Useful? React with 👍 / 👎.
|
Thanks for the contribution! After reviewing this PR alongside the related secrets provider PRs (#43640, #43641, #43643, #43644, #43645, #43646), we need to close these for an architectural rework before they can land. The issue: parallel implementation instead of extending the existing systemOpenClaw already has a mature secrets infrastructure in These PRs introduce a new parallel system — a new There's also a hidden dependency across the PR set: the aws, azure, keyring, and vault branches all import What we need insteadNew cloud-backend providers (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, OS keyring, env vars) should be implemented as resolvers within the existing
We'd love to see these re-submitted with the corrected architecture. |
Summary
Adds
EnvSecretProviderwhich resolves secret references from environment variables — useful as a simple local-dev fallback or for CI pipelines that inject secrets via env.This is also the foundational PR for the secrets system, introducing:
secret-resolution.ts— the${provider:name}reference resolution engine with GCP provider inlineconfig.tsexports for the resolution APItest/setup.tswith GCP mockdocs/concepts/secrets.md,docs/gateway/secrets.md, GCP console guidescripts/secrets/Provider: env
Config:
{ "secrets": { "providers": { "env": {} } } }Usage:
${env:SLACK_BOT_TOKEN}Resolves to
process.env.SLACK_BOT_TOKEN.Notes
Split from #24272 per @vincentkoc's request. Each provider is now a standalone PR. Subsequent PRs in this series: keyring, AWS, Azure, Vault, rotation.