Skip to content

fix(google): support Vertex ADC from GCE metadata service accounts#79476

Closed
batson-j wants to merge 171 commits intoopenclaw:mainfrom
batson-j:vertex-gce-metadata-adc
Closed

fix(google): support Vertex ADC from GCE metadata service accounts#79476
batson-j wants to merge 171 commits intoopenclaw:mainfrom
batson-j:vertex-gce-metadata-adc

Conversation

@batson-j
Copy link
Copy Markdown

@batson-j batson-j commented May 8, 2026

Summary

  • Problem: google-vertex only accepted file-based authorized_user ADC, so GCE Docker installs with attached VM service accounts could not authenticate even though Vertex AI worked directly.
  • Why it matters: GCE service-account ADC should work without gcloud auth application-default login or downloaded service-account keys.
  • What changed: Added GCE metadata token fallback for Vertex ADC, taught runtime auth, models status --probe, and doctor to recognize provider-declared live metadata auth, and routed local simple-completion Vertex calls through the OpenClaw Google Vertex transport.
  • What did NOT change (scope boundary): No new service-account key handling, no gcloud login requirement, no broad provider auth rewrite.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Google Vertex auth was gated on local authorized_user ADC discovery before the provider transport was exposed, and the ADC header resolver threw when no local ADC file existed or when the local ADC file was not authorized_user.
  • Missing detection / guardrail: Tests covered authorized_user ADC refresh, but not VM-attached service-account ADC through the GCE metadata server, nor probe/doctor discovery for live provider auth evidence.
  • Contributing context (if known): Docker on GCE can call Vertex AI successfully with the attached VM service account even when no local ADC JSON exists.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/google/transport-stream.test.ts, src/agents/model-auth-live-evidence.test.ts, src/commands/models/list.probe.targets.test.ts, src/commands/models/list.status.test.ts, src/commands/doctor-memory-search.test.ts.
  • Scenario the test should lock in: no local ADC file plus metadata token succeeds; non-authorized_user ADC plus metadata token succeeds; unavailable metadata still throws the credential error; status/probe/doctor can discover live metadata auth without persisting tokens.
  • Why this is the smallest reliable guardrail: It covers the provider-owned ADC resolver plus the generic auth evidence surfaces that previously hid the provider from probe/doctor.
  • Existing test that already covers this (if any): Existing authorized_user ADC refresh coverage remains unchanged.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • google-vertex can authenticate on GCE using the VM's attached service-account token from the metadata server when local ADC is absent or not authorized_user.
  • models status --probe and openclaw doctor recognize this as GCE metadata service account auth evidence.
  • Local infer model run --model google-vertex/... uses the OpenClaw Google Vertex transport path.

Diagram (if applicable)

Before:
[google-vertex on GCE] -> [no authorized_user ADC file] -> [provider unavailable/auth failure]

After:
[google-vertex on GCE] -> [metadata token probe] -> [Bearer token for Vertex request]

Security Impact (required)

  • New permissions/capabilities? (Yes/No): No
  • Secrets/tokens handling changed? (Yes/No): Yes
  • New/changed network calls? (Yes/No): Yes
  • Command/tool execution surface changed? (Yes/No): No
  • Data access scope changed? (Yes/No): No
  • If any Yes, explain risk + mitigation:
    • The Google Vertex provider can now request a VM service-account access token from the GCE metadata server when local ADC is absent or not authorized_user.
    • The request is fixed to Google’s metadata token URL, sends Metadata-Flavor: Google, has a 1s timeout, returns undefined on failure, and does not persist or log the access token.
    • models status --probe and doctor expose only the non-secret gcp-vertex-credentials marker/source, not the token.
    • This follows OpenClaw's documented trusted-operator/plugin trust model in SECURITY.md; the provider plugin is part of the gateway TCB. It also intentionally differs from generic user-controlled URL fetch surfaces documented in docs/security/network-proxy.md because this is a provider-owned fixed metadata endpoint for ADC.

Repro + Verification

Environment

  • OS: Linux VM on GCE
  • Runtime/container: Docker Compose OpenClaw CLI/Gateway container
  • Model/provider: google-vertex/gemini-3.1-pro-preview
  • Integration/channel (if any): N/A
  • Relevant config (redacted): GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION, attached VM service account; no service-account key file required.

Steps

  1. Run OpenClaw in Docker on a GCE VM with an attached service account that can call Vertex AI.
  2. Configure default model as google-vertex/gemini-3.1-pro-preview.
  3. Run models status --probe --probe-provider google-vertex.
  4. Run infer model run --gateway --model google-vertex/gemini-3.1-pro-preview --prompt "Say exactly: vertex ok".

Expected

  • Status/probe detects source=GCE metadata service account and probes Vertex successfully.
  • Inference returns vertex ok.

Actual

  • Before this change, the provider required local authorized_user ADC and failed without a local ADC JSON file.
  • After this change, the GCE metadata service-account token path succeeds.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Verification run locally through the Docker compose CLI image because host node/pnpm were not on PATH:

  • pnpm docs:list
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/auth-credential-semantics.md docs/cli/models.md docs/plugins/manifest.md docs/providers/google.md extensions/google/index.test.ts extensions/google/openclaw.plugin.json extensions/google/provider-registration.ts extensions/google/transport-stream.test.ts extensions/google/vertex-adc.ts src/agents/model-auth.ts src/agents/model-auth-live-evidence.ts src/agents/model-auth-live-evidence.test.ts src/agents/provider-transport-stream.ts src/agents/provider-transport-stream.test.ts src/agents/simple-completion-transport.ts src/agents/simple-completion-transport.test.ts src/commands/doctor-memory-search.test.ts src/commands/doctor-memory-search.ts src/commands/models/list.auth-overview.ts src/commands/models/list.probe.targets.test.ts src/commands/models/list.probe.ts src/commands/models/list.status-command.ts src/commands/models/list.status.test.ts src/plugins/manifest-registry.test.ts src/plugins/manifest.ts src/secrets/provider-env-vars.dynamic.test.ts src/secrets/provider-env-vars.ts
  • pnpm test extensions/google/transport-stream.test.ts extensions/google/index.test.ts src/agents/model-auth-live-evidence.test.ts src/agents/provider-transport-stream.test.ts src/agents/simple-completion-transport.test.ts src/commands/models/list.probe.targets.test.ts src/commands/models/list.status.test.ts src/commands/doctor-memory-search.test.ts src/plugins/manifest-registry.test.ts src/secrets/provider-env-vars.dynamic.test.ts -- --reporter=verbose
  • pnpm docs:check-i18n-glossary
  • git diff --check

Manual live proof from GCE Docker:

- google-vertex effective=env:gcp-vert...dentials | env=gcp-vert...dentials | source=GCE metadata service account
Auth probes: google-vertex/gemini-3.1-pro-preview ok
model.run via gateway -> output: vertex ok

Human Verification (required)

  • Verified scenarios: metadata ADC success with no local ADC; metadata unavailable fallback error; non-authorized_user ADC falling through to metadata; existing authorized_user ADC refresh behavior; status/probe/doctor auth evidence discovery; local simple-completion Vertex transport selection.
  • Edge cases checked: non-OK metadata response, malformed metadata JSON, missing token, required env gates, token not persisted/logged, non-secret marker in auth overview.
  • What you did not verify: Full pnpm check/full pnpm test; only focused touched-surface tests plus live Docker/GCE proof were run.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No): Yes
  • Config/env changes? (Yes/No): Yes
  • Migration needed? (Yes/No): No
  • If yes, exact upgrade steps: Existing local authorized_user ADC behavior is unchanged. GCE metadata ADC requires GOOGLE_CLOUD_PROJECT or GCLOUD_PROJECT, plus GOOGLE_CLOUD_LOCATION, and an attached VM service account with Vertex AI permission.

Risks and Mitigations

  • Risk: Non-GCE environments could be slowed by metadata probing.
    • Mitigation: Metadata probes are gated by explicit Google Cloud project/location env evidence and use a 1s abort timeout.
  • Risk: Metadata access tokens could leak through status/probe output.
    • Mitigation: Runtime requests use the token only in-memory for Authorization; status/probe/doctor return the non-secret gcp-vertex-credentials marker and source label only.

steipete and others added 30 commits May 5, 2026 02:42
Normalize WhatsApp onboarding allowlist entries to digit-only WhatsApp IDs and reject invalid owner-phone inputs during prompt validation.

(cherry picked from commit 68a500c)
* fix(telegram): reuse preview for long text finals

* test(qa): cover long telegram finals

* fix(qa): satisfy extension lint

* fix(qa): keep telegram long final fixture to two chunks

* test(telegram): cover three chunk finals

* fix(telegram): force long final preview boundary

(cherry picked from commit e03fe1e)
Bind the default loopback gateway listener only to `127.0.0.1` on Windows so libuv dual-stack `::1` behavior cannot wedge localhost HTTP requests.

Also keeps non-Windows dual-loopback behavior covered, replaces the redundant Windows passthrough test with guard coverage, and adds the required changelog entry.

Fixes openclaw#69674.

Tests:
- pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/gateway/net.ts src/gateway/net.test.ts
- pnpm test src/gateway/net.test.ts
- pnpm check:changed
- GitHub required checks: green

Thanks @SARAMALI15792.

Co-authored-by: saram ali <140950904+SARAMALI15792@users.noreply.github.com>
Co-authored-by: Brad Groux <3053586+BradGroux@users.noreply.github.com>
(cherry picked from commit 978bc53)
…isted] (openclaw#74161)

Summary:
- The PR updates agents skill prompt guidance to require exact `<location>` paths for single- and multi-skill selection, adds prompt assertions, and records the fix in the changelog.
- Reproducibility: yes. Static source reproduction is enough: current main lacks the exact-`<location>` guard  ... illsSection()`, while the PR diff adds it to both selection branches and asserts the resulting prompt text.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix: enforce exact skill paths for all skill matches

Validation:
- ClawSweeper review passed for head 743c984.
- Required merge gates passed before the squash merge.

Prepared head SHA: 743c984
Review: openclaw#74161 (comment)

Co-authored-by: tianguicheng <tianguicheng@xiaomi.com>
Co-authored-by: sallyom <somalley@redhat.com>
(cherry picked from commit c739088)
Accept drive-absolute Windows sandbox Docker bind sources in config and runtime validation while keeping blocked-path and allowed-root comparisons case-insensitive for Windows drive paths.

Also remove a stale WhatsApp setup import that blocked extension lint after the rebase.

Co-authored-by: 6607changchun <84566142+6607changchun@users.noreply.github.com>
Co-authored-by: Brad Groux <3053586+BradGroux@users.noreply.github.com>
(cherry picked from commit d02fbc6)
Adds cap_drop and no-new-privileges hardening for the bundled gateway Docker Compose services.\n\nThanks @VintageAyu.

(cherry picked from commit f9da484)
…penclaw#77280)

Merged via squash.

Prepared head SHA: f4188b4
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Reviewed-by: @openperf

(cherry picked from commit 31da1fe)
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 8, 2026

Codex review: needs changes before merge.

Summary
The PR adds Google Vertex GCE metadata service-account ADC support across the Google provider transport, live auth/probe/status/doctor surfaces, docs, changelog, and targeted tests while also carrying unrelated release and workflow changes.

Reproducibility: yes. Source inspection shows current main throws when no local ADC file exists and only exposes the Vertex transport for authorized_user ADC; the PR body also gives a GCE Docker reproduction and after-fix output, though I did not run the live GCE repro in this read-only review.

Real behavior proof
Sufficient (live_output): The PR body includes copied GCE Docker live output showing metadata service-account auth, a successful Vertex probe, and model.run returning vertex ok.

Next step before merge
A repair worker can produce a narrow replacement branch by extracting the Google Vertex ADC changes and dropping the unrelated release/workflow/lockfile churn.

Security
Needs attention: The intended metadata-token change is bounded, but the PR also modifies release, workflow, dependency, and install-script surfaces outside its stated scope.

Review findings

  • [P2] Remove unrelated release and workflow changes — .github/workflows/openclaw-release-publish.yml:39-43
Review details

Best possible solution:

Land a narrow current-main branch containing only the Google Vertex metadata ADC implementation, docs/changelog, and focused regression tests, with unrelated release and workflow changes removed.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main throws when no local ADC file exists and only exposes the Vertex transport for authorized_user ADC; the PR body also gives a GCE Docker reproduction and after-fix output, though I did not run the live GCE repro in this read-only review.

Is this the best way to solve the issue?

No as a merge artifact. The provider-local metadata fallback plus generic live-auth evidence direction is reasonable, but the PR must be rebased or replaced with a narrow diff before it is a maintainable fix.

Full review comments:

  • [P2] Remove unrelated release and workflow changes — .github/workflows/openclaw-release-publish.yml:39-43
    This PR is scoped to Google Vertex ADC, but the diff also changes release publishing workflows, package/version metadata, the lockfile, and hundreds of unrelated files. That makes the branch unsafe to merge as a focused auth fix; please rebase or rebuild it so only the Vertex ADC code, docs, changelog, and tests remain.
    Confidence: 0.96

Overall correctness: patch is incorrect
Overall confidence: 0.9

Security concerns:

  • [medium] Unrelated release and supply-chain surfaces — .github/workflows/openclaw-release-publish.yml:39
    The diff changes release publishing workflows, plugin publish workflows, pnpm-lock.yaml, and install/e2e scripts in a Google Vertex auth PR. Those execution and supply-chain surfaces need to be removed from this PR or reviewed separately before merge.
    Confidence: 0.95

Acceptance criteria:

  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/auth-credential-semantics.md docs/cli/models.md docs/providers/google.md extensions/google/index.test.ts extensions/google/openclaw.plugin.json extensions/google/provider-registration.ts extensions/google/transport-stream.test.ts extensions/google/vertex-adc.ts src/agents/model-auth.ts src/agents/model-auth-live-evidence.ts src/agents/model-auth-live-evidence.test.ts src/commands/doctor-memory-search.ts src/commands/doctor-memory-search.test.ts src/commands/models/list.auth-overview.ts src/commands/models/list.probe.ts src/commands/models/list.probe.targets.test.ts src/commands/models/list.status-command.ts src/commands/models/list.status.test.ts src/plugins/manifest.ts src/plugins/manifest-registry.test.ts src/secrets/provider-env-vars.ts src/secrets/provider-env-vars.dynamic.test.ts
  • pnpm test extensions/google/transport-stream.test.ts extensions/google/index.test.ts src/agents/model-auth-live-evidence.test.ts src/commands/models/list.probe.targets.test.ts src/commands/models/list.status.test.ts src/commands/doctor-memory-search.test.ts src/plugins/manifest-registry.test.ts src/secrets/provider-env-vars.dynamic.test.ts -- --reporter=verbose
  • git diff --check
  • When feasible, capture redacted GCE Docker live proof for models status --probe --probe-provider google-vertex and infer model run --gateway --model google-vertex/gemini-3.1-pro-preview --prompt "Say exactly: vertex ok".

What I checked:

  • Current main lacks metadata ADC fallback: Current main only resolves local ADC files and throws when no credentials file exists or when the file is not authorized_user; there is no metadata server token fallback in this resolver. (extensions/google/vertex-adc.ts:169, a130dd080b8a)
  • Current provider registration is gated on authorized_user ADC: The Google provider only returns the google-vertex stream function when hasGoogleVertexAuthorizedUserAdcSync() succeeds, so VM service-account ADC without a local authorized_user file cannot reach the provider-owned transport. (extensions/google/provider-registration.ts:56, a130dd080b8a)
  • Manifest evidence is local-file only on main: The Google Vertex setup metadata currently declares only local-file-with-env auth evidence for GOOGLE_APPLICATION_CREDENTIALS or gcloud fallback paths. (extensions/google/openclaw.plugin.json:592, a130dd080b8a)
  • Dependency contract supports the requested ADC path: Google Cloud ADC documentation lists the attached service account returned by the metadata server as an ADC source, and Compute metadata docs require the Metadata-Flavor: Google request header.
  • PR branch is far outside the stated scope: The GitHub compare API reports the PR head as diverged from current main with 171 commits ahead and 1534 behind; the PR metadata reports 612 changed files, including release workflows, package metadata, lockfile, and many unrelated subsystems. (aa02d366b618)
  • Real behavior proof is present in the PR body: The body includes copied GCE Docker live output showing source=GCE metadata service account, a successful google-vertex/gemini-3.1-pro-preview auth probe, and gateway model output vertex ok. (aa02d366b618)

Likely related people:

  • steipete: Introduced current Vertex authorized_user ADC support and has recent Google transport/provider manifest history around the affected files. (role: introduced behavior and recent Google provider maintainer; confidence: high; commits: 0b59964ec945, 1ad50a36ac72, 85826c83e4a1; files: extensions/google/vertex-adc.ts, extensions/google/provider-registration.ts, extensions/google/transport-stream.ts)
  • shakkernerd: Recent commits own provider auth evidence lookup, metadata reuse, and model status/probe alignment that this PR extends for live metadata auth. (role: adjacent auth-evidence and model-auth maintainer; confidence: high; commits: dd5b96c11dd0, 10b9adb010a6, 835b88460600; files: src/secrets/provider-env-vars.ts, src/agents/model-auth.ts, src/commands/models/list.probe.ts)
  • vincentkoc: Recent history includes provider env metadata, model status/list performance, and plugin contract surfaces adjacent to the generic parts of this change. (role: adjacent plugin and status surface maintainer; confidence: medium; commits: 6afac5208aac, 7536993397c6, b74401074b6e; files: src/secrets/provider-env-vars.ts, src/plugins/manifest.ts, src/agents/model-auth.ts)

Remaining risk / open question:

  • The branch is heavily diverged and mixes unrelated release/publish workflow, lockfile, package-version, and subsystem changes into an auth fix.
  • The intended metadata token path touches credential and fixed internal metadata egress behavior, so it should still get maintainer security review after the branch is narrowed.
  • No local tests were run because this was a read-only review; validation here is source inspection, PR body proof, and GitHub/API metadata.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a130dd080b8a.

@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented May 9, 2026

Thanks for the PR. This branch carries unrelated release/main replay around the Google Vertex ADC change, so it is not reviewable as a focused Telegram PR. Please reopen as a narrow PR with only the intended fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: android App: android app: ios App: ios app: macos App: macos app: web-ui App: web-ui channel: discord Channel integration: discord channel: feishu Channel integration: feishu channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: irc channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: qa-channel Channel integration: qa-channel channel: qqbot channel: signal Channel integration: signal channel: slack Channel integration: slack channel: synology-chat channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: twitch Channel integration: twitch channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser cli CLI command changes commands Command implementations docker Docker and sandbox tooling docs Improvements or additions to documentation extensions: acpx extensions: anthropic extensions: arcee extensions: byteplus extensions: cerebras extensions: cloudflare-ai-gateway extensions: codex extensions: copilot-proxy Extension: copilot-proxy extensions: deepinfra extensions: deepseek extensions: diagnostics-otel Extension: diagnostics-otel extensions: diagnostics-prometheus extensions: duckduckgo extensions: fal extensions: gradium extensions: huggingface extensions: inworld Extension: inworld extensions: kilocode extensions: kimi-coding extensions: litellm extensions: llm-task Extension: llm-task extensions: lmstudio extensions: lobster Extension: lobster extensions: memory-core Extension: memory-core extensions: memory-lancedb Extension: memory-lancedb extensions: memory-wiki extensions: minimax extensions: moonshot extensions: nvidia extensions: open-prose Extension: open-prose extensions: openai extensions: qa-lab extensions: qianfan extensions: senseaudio extensions: stepfun extensions: synthetic extensions: tavily extensions: tencent extensions: together extensions: tokenjuice Changes to the bundled tokenjuice extension extensions: tts-local-cli extensions: venice extensions: vercel-ai-gateway extensions: volcengine extensions: webhooks extensions: xiaomi gateway Gateway runtime plugin: azure-speech Azure Speech plugin plugin: bonjour Plugin integration: bonjour plugin: file-transfer plugin: google-meet plugin: migrate-claude plugin: migrate-hermes scripts Repository scripts size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.