Skip to content

feat(secrets): pluggable SecretRef sources via plugin SDK seam#72548

Open
akoscz wants to merge 31 commits into
openclaw:mainfrom
akoscz:feature/secrets-provider-sdk
Open

feat(secrets): pluggable SecretRef sources via plugin SDK seam#72548
akoscz wants to merge 31 commits into
openclaw:mainfrom
akoscz:feature/secrets-provider-sdk

Conversation

@akoscz

@akoscz akoscz commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: OpenClaw's SecretRef contract was closed to three built-in sources (env/file/exec). Every new vendor backend (GCP Secret Manager, OS keyring, AWS Secrets Manager, HashiCorp Vault, native 1Password) required a core PR that touched config types, zod schemas, gateway wire schemas, the resolver dispatch, and validation. Issue [Feature]: support to use plugin to implement secret ref provider and cover core schemas #71593 (cited by the Codex automated review on the issue) and the @Colstuwjx follow-up on closed PR feat(secrets): add AWS Secrets Manager provider #43643 ask for a manifest-first plugin seam so contributors can ship new sources without touching core.
  • Why it matters: The old shape blocked at least four community PRs on architecture, and Codex/clawsweeper has the gap publicly tracked across config, validation, resolver, and gateway-protocol layers. Closes the loop on [Feature]: support to use plugin to implement secret ref provider and cover core schemas #71593.
  • What changed: New manifest contract contracts.secretProviders: string[], new public SDK subpath openclaw/plugin-sdk/secret-provider, lazy public-artifact loader (src/plugins/secret-provider-public-artifacts.ts), source-id → plugin-id helper in the contract registry, and a one-line dispatch fall-through in src/secrets/resolve.ts. Two reference bundled plugins: extensions/secrets-gcp/ (Google Cloud Secret Manager) and extensions/secrets-keyring/ (macOS Keychain via security, Linux libsecret via secret-tool). Config zod schemas + Gateway wire-protocol schemas open additively to accept plugin-owned sources while keeping strict per-source validation for the built-in literals.
  • What did NOT change (scope boundary): env/file/exec resolution paths are byte-unchanged. Existing configs that don't reference plugin sources behave identically. The "broader SecretRef field-coverage" half of [Feature]: support to use plugin to implement secret ref provider and cover core schemas #71593 (making skills.entries.*.env, plugins.entries.*.config, channel credential schemas all accept SecretRef objects) is explicitly deferred. openclaw secrets configure interactive flow doesn't yet list plugin sources (documented in plugin pages). No env/file/exec fields in core config types were modified.

Change Type (select all)

  • Feature
  • Refactor required for the fix
  • Docs

Scope (select all touched areas)

  • Gateway / orchestration
  • Auth / tokens
  • API / contracts

Linked Issue/PR

Root Cause (if applicable)

N/A — feature work, not a bug fix.

Regression Test Plan (if applicable)

N/A — feature work. New tests cover the new behavior; see Evidence below.

Reproduction Steps (if applicable)

N/A — feature work. End-to-end usage is documented in docs/plugins/secrets-gcp.md and docs/plugins/secrets-keyring.md.

Evidence

  • Failing test/log before + passing after

Targeted test counts (all green):

Surface Tests
src/plugins/manifest-registry.test.ts (manifest contract) 42 / 42
src/plugins/contracts/registry.contract.test.ts 16 / 16
src/plugins/contracts/boundary-invariants.test.ts 13 / 13
src/plugins/bundled-capability-metadata.test.ts (uniqueness invariant) 3 / 3
src/plugins/secret-provider-public-artifacts.test.ts 4 / 4
src/plugins/secret-provider-resolver.test.ts (cache + dispatch) 6 / 6
src/secrets/resolve.test.ts (incl. real env+file+plugin batch integration) 25 / 25
src/secrets/exec-secret-ref-id-parity.test.ts (zod ↔ wire parity) 34 / 34
src/config/types.secrets.plugin-source.test.ts (open SecretRefSource) 5 / 5
src/config/zod-schema.plugin-secret-providers.test.ts (open zod schemas) 10 / 10
src/config/config.secrets-schema.test.ts (per-arm error preservation) 13 / 13
src/config/schema.base.generated.test.ts (introspection unchanged) 5 / 5
src/gateway/protocol/primitives.secretref.test.ts (open wire schemas) 5 / 5
extensions/secrets-gcp/secret-provider.test.ts 14 / 14
extensions/secrets-keyring/secret-provider.test.ts 18 / 18

Live end-to-end smokes against real backends (not just mocks):

  • Linux libsecret (kwalletd6 on CachyOS): round-tripped a real secret stored via secret-tool store ... through createKeyringSecretProvider().resolve(). Validation guards rejected leading-- service / shell metas / relative keychainPath / wrong suffix / leading-- ref id before any spawn.
  • macOS Keychain (Darwin 24.6.0, ssh to a Mac): round-tripped a real secret stored with NATIVE flags security add-generic-password -s openclaw -a openai-api-key -w ... through the plugin. The plugin uses macOS's native vocabulary, so users with existing Keychain entries don't need to re-store. Validation guards rejected the same set as Linux.
  • Real Google Cloud Secret Manager (private GCP project via Application Default Credentials): round-tripped a real test secret through createGcpSecretProvider().resolve() for both latest and explicit versionSuffix: "1". Test secret deleted after; gcloud account context restored.

Human Verification (required)

What I personally verified (not just CI):

  • Verified scenarios:
    • Linux libsecret round-trip end-to-end with secret-tool against running kwalletd6.
    • macOS Keychain round-trip end-to-end via SSH to a real Mac, secret stored with security's native -s service -a id flag mapping.
    • GCP Secret Manager round-trip end-to-end against a real GCP project, default latest version + explicit versionSuffix: "1".
    • Keyring validation guards reject all 5 attack vectors (leading-dash service, shell metas, relative path, wrong suffix, leading-dash ref id) on both Linux and macOS, before any execFile spawn.
    • GCP validation guards reject malformed project (too short, leading digit, uppercase, path traversal, trailing dash), malformed versionSuffix (bogus, 0, -1), wrong source.
    • Acceptance: a config with secrets.providers.myGcp = { source: "gcp", project: "<your-project-id>" } and apiKey: { source: "gcp", provider: "myGcp", id: "..." } parses through zod, dispatches through the resolver, and returns the value from real GCP.
    • pnpm tsgo:all clean.
    • pnpm docs:check-mdx (473 files), pnpm docs:check-links (3,644 internal links, 0 broken), pnpm config:docs:check, pnpm config:schema:check, pnpm plugin-sdk:api:check all green.
    • The Gateway wire schema fix preserves the zod ↔ wire parity invariant (exec-secret-ref-id-parity.test.ts 34/34) and rejects malformed built-in refs (no silent rescue by the plugin arm).
  • Edge cases checked:
    • Plugin SecretRef object that includes a built-in source name ({source:"env", id:"bad"}) does NOT get rescued by the plugin arm on either zod or wire side; it fails at the strict env arm with the original specific error message ("Env secret reference id must match...").
    • GCP secret name with hyphens (the real-world common case).
    • 256-character GCP ref id rejected; leading-hyphen GCP ref id accepted (per GCP grammar; safe because GCP path is URL-only with no shell).
    • PEM-style stored value with intentional trailing \n is preserved by keyring resolvers.
    • Linux secret-tool "command not found" produces an actionable install message; "secret missing in libsecret" produces a distinct error pointing at secret-tool store.
    • Generated baseline hashes (config + plugin SDK) regenerate deterministically post-rebase.
    • pnpm check:changed typecheck + lint + import cycles all green; the only "test" failures in a full sweep run were 3 unrelated extension tests (amazon-bedrock) plus a worker OOM on a flaky shard, none touched by this PR.
  • What I did NOT verify:
    • End-to-end against real AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, or native 1Password. Those are not part of this PR; they would land as separate plugins on the seam this PR delivers.
    • macOS login-keychain unlock prompt UX — verified the dedicated-keychain workflow only.
    • Windows behavior: the keyring plugin throws on Windows (no backend). The other layers don't depend on platform.
    • Hot-reload of plugin runtime in dev mode against the per-source resolver cache (calls _resetSecretProviderResolverCache()); production reload via secrets.reload re-walks the dispatch.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes. Configs that don't reference plugin sources resolve through identical code paths; built-in source error messages and behavior are unchanged.
  • Config/env changes? Additive only. New optional contracts.secretProviders manifest field and new optional secrets.providers.<alias> = { source: "<plugin-source>", ... } shape.
  • Migration needed? No. New sources are opt-in via plugin enablement (plugins.entries.secrets-gcp.enabled = true and similar).

Risks and Mitigations

  • Risk: The superRefine-routed schema in src/config/zod-schema.core.ts is a non-standard zod pattern compared to the pure discriminated-union approach. It exists to preserve per-arm error messages downstream of validation.ts:mapZodIssueToConfigIssue (the validation pipeline's union-error flattener doesn't drill into nested invalid_union issues by default).
    • Mitigation: Added a generic extractDeepestUnionArmIssue helper to validation.ts that drills into nested invalid_union issues so error messages survive the flattening. config.secrets-schema.test.ts (13 tests) locks in the per-arm message preservation behavior. JSON Schema introspection is preserved (schema.base.generated.test.ts passes; the generated baseline is unchanged for built-in source shapes).
  • Risk: The Gateway wire schema's plugin arm uses a JSON Schema pattern with a negative lookahead (^(?!(?:env|file|exec)$).+$) to prevent the plugin arm from rescuing a malformed built-in source. Lookaheads are valid ECMA 262 regex but historically vary across JSON Schema validators.
    • Mitigation: The repo uses AJV which supports ECMA 262 lookaheads. Test does not let the plugin arm silently rescue malformed built-in refs proves this works in practice.
  • Risk: Plugin authors writing a secret-provider.ts factory could accidentally export a function whose name matches the discovery convention (/^create.*SecretProvider$/) but returns a malformed object.
    • Mitigation: loadBundledSecretProviderEntriesFromDir validates the factory output via isSecretProviderPlugin (id, label, resolve presence) and returns null on shape failure. secret-provider-resolver.ts cross-checks entry.id === source and returns undefined on mismatch, leading to a clear "Unknown secret provider source" error rather than silent acceptance.

@akoscz akoscz requested a review from a team as a code owner April 27, 2026 03:16
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: matrix Channel integration: matrix channel: telegram Channel integration: telegram app: web-ui App: web-ui gateway Gateway runtime scripts Repository scripts agents Agent runtime and tooling size: XL labels Apr 27, 2026
@greptile-apps

greptile-apps Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a pluggable SecretRef source system, adding a manifest contract (contracts.secretProviders), a public SDK subpath (openclaw/plugin-sdk/secret-provider), a lazy bundled-plugin loader, and two reference plugins (GCP Secret Manager and OS Keyring). The dispatch in src/secrets/resolve.ts falls through to the plugin seam after checking the three built-in sources via new type guards, and both the zod config schema and gateway wire schema open additively while using a negative-lookahead pattern to prevent the plugin arm from silently rescuing malformed built-in refs.

Confidence Score: 5/5

Safe to merge. No P0 or P1 issues found; all previously flagged race conditions and silent-failure paths have been addressed.

The implementation is thorough: built-in dispatch uses proper type guards, the resolver cache correctly handles misses with a null sentinel, partial factory failures surface via console.warn rather than silent drops, and the gateway negative-lookahead pattern prevents plugin-arm rescue of malformed built-in refs. TypeScript types and zod schemas are consistent. All three previously raised review issues are resolved.

No files require special attention.

Reviews (3): Last reviewed commit: "fix(secrets-plugins): address greptile P..." | Re-trigger Greptile

Comment thread extensions/secrets-gcp/secret-provider.ts
Comment thread src/plugins/secret-provider-public-artifacts.ts
Comment thread src/plugins/secret-provider-resolver.ts
@Colstuwjx

Copy link
Copy Markdown

Hi @akoscz , thanks for proposing this PR, just wonder if this PR could cover all config fields in openclaw.json no matter it's built-in config fields, e.g. skills.entries.xxx.env.* , or the other fields that might introduced from 3rd party plugins.

Thanks.

@Colstuwjx

Copy link
Copy Markdown

(making skills.entries..env, plugins.entries..config, channel credential schemas all accept SecretRef objects) is explicitly deferred.

BTW, just want to confirm whether you'll have follow-up PR to add all fields support, including skills.entries..env, plugins.entries..config, channel credential schemas etc. Since we have a lot of secret configs stored in skill env vars or self develop plugin config and channel config, it's really matter to me to support to cover all of these fields by using a secret provider e.g. aws sm provider to mask these info and make sure openclaw LLM itself can't read these credentials.

@akoscz akoscz force-pushed the feature/secrets-provider-sdk branch 2 times, most recently from dc7b922 to fe8ccd8 Compare April 28, 2026 00:25
@akoscz

akoscz commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai review

@akoscz

akoscz commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Hey @Colstuwjx — good question, and you've put your finger on exactly the gap.

What this PR does: opens the backendSecretRef.source is no longer a closed enum, so plugins (yours included) can ship a secret resolver for AWS SM, Vault, Bitwarden, OS keyring, etc. Any config field that today accepts a SecretRef will accept your custom source automatically once the plugin is loaded.

What it doesn't do: widen the frontend — i.e. teach more openclaw.json fields to accept SecretRef objects in addition to plain strings. Today the picture is:

  • ✅ Already accepts SecretRef: agents.*.providers.*.apiKey, channel credential schemas (Slack botToken/signingSecret, Telegram botToken, Discord token, Bluesky appPassword, Matrix password, etc.), skills.entries.*.apiKey, GCP serviceAccountRef. These all work with this PR's plugin sources today.
  • ❌ Still raw-string only: skills.entries.*.env, skills.entries.*.config, plugins.entries.*.config, hook env, agent runtime env records (these are z.record(string, string) or z.record(string, unknown)).

The unsupported group is what you're calling out, and it's the right next step. I scoped this PR narrowly so it's reviewable as a pure backend seam — adding SecretRef acceptance to those env/config records is a separate concern (they're records, not single fields, so the schema shape and the resolver wiring both need a small change), but it's a natural follow-up that this PR explicitly enables.

I'm not promising a follow-up PR myself right now — I want this one to land first and want to see what maintainers think the right next step is — but the seam this PR adds is the prerequisite for that work, and once it's in, widening skills.entries.*.env (the surface you cited as most painful) is a small, well-bounded change against the existing SecretInputSchema pattern. If maintainers agree on direction I'd be happy to take a swing at it as a follow-up; if you'd like to file an issue tracking the env/config widening I'll subscribe.

@akoscz

akoscz commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Round-2 P2 findings addressed in 12d9251e01.

1. GCP sequential calls (extensions/secrets-gcp/secret-provider.ts) — fan out per-ref accessSecretVersion calls in parallel via Promise.all, deduped by id. Cuts latency from N round-trips to 1 for any multi-ref config from the same provider. New test in secret-provider.test.ts asserts both the parallel in-flight count (maxInFlight === 2 for two unique ids) and dedup of duplicate ids (3 refs with one duplicate → 2 API calls).

2. Misshaped factory return silently dropped (src/plugins/secret-provider-public-artifacts.ts) — collectProviderFactories now pushes a descriptive Error into the errors array when isSecretProviderPlugin(candidate) returns false, naming the offending factory and the missing fields. This routes through the same paths added in round 1: full-failure throws Unable to initialize secret providers for plugin <id> with the shape error in cause; partial success (some factories valid, some misshaped) emits a console.warn with the plugin id and the diagnostic instead of silently dropping. Updated the existing "rejects misshaped output" test to assert the throw + cause chain, and added a partial-success warn test.

All touched-surface tests green; pnpm check:changed clean.

@akoscz

akoscz commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai review again plz

akoscz added 26 commits May 25, 2026 18:54
…single-newline strip, ENOENT via err.code, drop unused signal, doc + test polish)
- secrets-keyring: add curly braces in stripCliTrailingNewline (oxlint curly)
- secrets-gcp: cache in-flight init promise so concurrent first-callers share
  one client; clear promise on init failure to allow retry
- secret-provider-public-artifacts: warn on partial-success factory errors
  instead of silently dropping
- secret-provider-resolver: cache negative lookups (null sentinel) so missing
  sources don't re-scan the artifact on every call
- secrets-gcp: fan out per-ref accessSecretVersion calls in parallel
  via Promise.all, deduped by id. Cuts latency from N round-trips to 1
  for multi-ref configs. Added test asserting parallel in-flight count
  and dedup of duplicate ids.
- secret-provider-public-artifacts: surface shape-check failures so
  factories that return misshaped objects get a diagnostic instead of
  silently disappearing. Updated test to assert the throw + cause when
  no factory in the artifact passes the shape check, plus a new test
  asserting the partial-success warn path.
Addresses clawsweeper P2/medium-security finding openclaw#1: previously a SecretRef
referencing a plugin-owned source could execute the owning plugin's artifact
even when the plugin was disabled (via plugins.enabled: false, plugins.deny,
entry enabled: false, allowlist miss, or default-off bundled state). The
artifact can spawn credential CLIs (secret-tool, security) or open cloud
SDK clients, so this is a real privilege escalation.

Resolver now consults isPluginEnabled({ pluginId, config }) after pluginId
lookup and before artifact load. Cache rules:
  - Successful artifact loads: cached (artifact is static)
  - 'No bundled plugin owns this source' / 'no matching factory': cached
    (stable conditions)
  - Activation result: NEVER cached — config can change between calls, so
    we recheck isPluginEnabled on every lookup, including cache hits

config: OpenClawConfig is threaded from resolvePluginSecretRefs in
src/secrets/resolve.ts (already required on every public secrets entry
via ResolveSecretRefOptions).

Existing 'Unknown secret provider source... Install or enable a plugin...'
error message already reads correctly for the disabled case; no copy
change needed.

Tests: 6 new cases covering denylist/disabled-state, no cache poisoning
on enable→disable→re-enable cycles, activation re-check on cache hits,
and config forwarding through dispatch.

Defers clawsweeper finding openclaw#2 (resolve installed/workspace plugins, not
just bundled) to a follow-up PR — that's a real SDK-completion gap but
doesn't block landing the security fix.
- src/plugins/installed-plugin-index-record-builder.ts: take HEAD side;
  upstream commit 7ac23ee deleted hasRuntimeContractSurface() and
  isLegacyImplicitStartupSidecar() entirely as part of dropping the
  legacy implicit-startup-sidecar fallback. Our addition of
  secretProviders to the helper is moot.
- docs/plugins/manifest.md: keep both the new secretProviders contract
  row (ours) and upstream's simplified tools description.
- src/plugins/manifest-tool-availability.ts: switch the env narrowing
  from a string-literal check to the isEnvSecretProviderConfig type
  guard. The open PluginSecretProviderConfig variant has source: string
  (not a literal), so providerConfig.source !== "env" doesn't narrow
  it out, and providerConfig.allowlist falls back to unknown. The type
  guard narrows the union cleanly.
- pnpm-lock.yaml: regenerated against current package.json set after
  rebase + reinstall.
- docs/.generated/*: baselines regenerated.
…e callers

- src/commands/daemon-install-helpers.ts: switch the exec narrowing
  from `provider.source !== 'exec'` to isExecSecretProviderConfig().
  The open PluginSecretProviderConfig variant has source: string (not
  a literal), so a string-literal check leaves the plugin variant in
  the union and the subsequent .passEnv access fails with TS2488.
- extensions/qqbot/src/bridge/config.ts: same fix using
  isEnvSecretProviderConfig() (exposed via openclaw/plugin-sdk/secret-input).
  The .allowlist access on the un-narrowed union failed with TS2339.

These two callers were previously fine because the SecretProviderConfig
union was discriminated and closed; opening it for plugin-owned sources
turns string-literal narrowing into a type-checking gap. The
isXSecretProviderConfig guards already exist for exactly this purpose.

pnpm-lock.yaml: regenerated; baselines: regenerated.
- extensions/clickclack/src/accounts.ts: switch env narrowing from
  literal check to isEnvSecretProviderConfig() guard, same pattern as
  prior fixes for qqbot, daemon-install-helpers, and
  manifest-tool-availability. New upstream code adds yet another caller
  that hit the open PluginSecretProviderConfig variant.
- src/config/schema.base.generated.ts: deleted (upstream removed it).
- docs/docs.json, docs/plugins/manifest.md: trivial conflict resolutions
  keeping both upstream's neighboring entries and our new ones.
- pnpm-lock.yaml + docs/.generated/* baselines: regenerated.
@akoscz akoscz force-pushed the feature/secrets-provider-sdk branch from d8a06bf to cfe5425 Compare May 26, 2026 00:57
@github-actions github-actions Bot added the dependencies-changed PR changes dependency-related files label May 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Dependency Changes Detected

This PR changes dependency-related files. Maintainers should confirm these changes are intentional.

Changed files:

  • extensions/secrets-gcp/package.json
  • extensions/secrets-keyring/package.json
  • package.json

Maintainer follow-up:

  • Review whether the dependency changes are intentional.
  • Inspect resolved package deltas when lockfile, shrinkwrap, or workspace dependency policy changes are present.
  • Treat package-lock.json and npm-shrinkwrap.json diffs as security-review surfaces.
  • Run pnpm deps:changes:report -- --base-ref origin/main --markdown /tmp/dependency-changes.md --json /tmp/dependency-changes.json locally for detailed release-style evidence.

@clawsweeper clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui channel: matrix Channel integration: matrix channel: qqbot channel: telegram Channel integration: telegram commands Command implementations dependencies-changed PR changes dependency-related files docs Improvements or additions to documentation gateway Gateway runtime merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. scripts Repository scripts size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: support to use plugin to implement secret ref provider and cover core schemas

2 participants