Skip to content

fix(backend): honour kubeconfig CA under Bun's native fetch#319

Merged
robert-cronin merged 2 commits into
kaito-project:mainfrom
surajssd:suraj/fix-frontend-authn-issues
Jun 9, 2026
Merged

fix(backend): honour kubeconfig CA under Bun's native fetch#319
robert-cronin merged 2 commits into
kaito-project:mainfrom
surajssd:suraj/fix-frontend-authn-issues

Conversation

@surajssd

@surajssd surajssd commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes Kubernetes authentication failures in the backend when running under Bun against clusters that use a private CA (e.g. AKS). Every API-server request was failing with UNABLE_TO_VERIFY_LEAF_SIGNATURE because Bun's native fetch ignores the kubeconfig CA that @kubernetes/client-node supplies via a Node.js https.Agent. This PR re-injects the TLS material through the per-request tls option that Bun honours, routes all backend client construction through a single Bun-safe helper, and hardens that helper (shared TLS mapping, SNI, cached extraction) with regression tests.

Get a visual overview of the PR here: https://suraj.io/share/prs/github.com/kaito-project/airunway/pull/319/PR-319-review-dashboard.html

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 📚 Documentation update
  • 🎨 UI/UX improvement
  • ♻️ Refactoring (no functional changes)
  • 🧪 Test update
  • 🔧 Build/CI configuration

Changes Made

Core fix — honour the kubeconfig CA under Bun (d44f5a0)

  • Add BunTlsHttpLibrary to backend/src/lib/kubeconfig.ts — a subclass of k8s.IsomorphicFetchHttpLibrary that overrides send() to call Bun's native fetch directly, translating the kubeconfig's TLS material into the per-request tls option Bun understands. Auth headers (Bearer tokens, etc.) are still applied upstream by the generated client via authMethods, so only the TLS material is re-injected.
  • Add a makeApiClient(kc, ApiClass) helper — a drop-in replacement for kc.makeApiClient(...) that reproduces the SDK's wiring (createConfiguration + ServerConfiguration) but swaps in BunTlsHttpLibrary.
  • Route all client construction through makeApiClient(...) across the backend services: auth.ts, autoscaler.ts, config.ts, kubernetes.ts, registry.ts, and secrets.ts — including the per-user token clients built from createUserKubeConfig(...).

Hardening from PR review (c9157b7)

  • Extract a shared kubeConfigToBunTls() helper as the single source of truth for kubeconfig → Bun TLS mapping, and route both BunTlsHttpLibrary.send() and proxyServiceRequest() (kubernetes.ts) through it so the two TLS paths can no longer drift.
  • Forward the SNI override, mapping the Node-side servername to Bun's camelCase serverName (previously dropped — broke clusters that set tls-server-name).
  • Forward the client-key passphrase, previously dropped.
  • Resolve the TLS material once per client and cache it, so applyToHTTPSOptions no longer re-runs the auth/cert pipeline (e.g. exec credential plugins, disk reads) on every request.
  • Document pfx and cluster.proxy-url as known Bun limitations, note the @kubernetes/client-node@1.4.0 SDK coupling, and flag the multipart-body caveat.
  • Add backend/src/lib/kubeconfig.test.ts regression tests, including a guard that the default path never disables certificate verification.

Testing

  • Unit tests pass (bun run test)
  • Manual testing performed
  • Tested with a Kubernetes cluster

Automated: bun run test passes — 864 tests, 0 failing (backend 736, frontend 128), including 13 new tests for the Bun TLS shim. The new tests cover: makeApiClient throws with no cluster; send() passes TLS material and the Authorization header to fetch; skipTLSVerify maps to rejectUnauthorized:false; the default path leaves verification on; SNI serverName mapping; non-2xx still yields a ResponseContext; and the TLS material is resolved only once across requests.

Manual: Verified end-to-end against a live AKS cluster (aks-0, private CA) via the running Web UI:

  • GET /api/cluster/status200 with {"connected":true,"clusterName":"aks-0","providerInstallation":{"installed":true,"crdFound":true}}
  • GET /api/deployments200 (CRD list succeeds; renders "No deployments yet")
  • GET /api/installation/gpu-capacity200
  • Zero UNABLE_TO_VERIFY_LEAF_SIGNATURE / certificate errors in backend logs or browser console.

Manual Testing w/ & w/o changes

git checkout main
bun install
bun run dev

You will start seeing logs like this in the terminal:

@airunway/backend dev $ bun --watch src/index.ts
│ [39 lines elided]
│ {"level":"error","time":"2026-06-08T22:12:13.274Z","pid":38386,"hostname":"Users-MacBook-Pro.local","error":{"code":"UNABLE_TO_VERIFY_LEAF_SIGNATURE","path":"https://suraj-cluster.hcp.southcentralus.azmk8s.io/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status","errno":0},"msg":"Error getting autoscaler status"}
│ {"level":"error","time":"2026-06-08T22:12:13.434Z","pid":38386,"hostname":"Users-MacBook-Pro.local","error":{"code":"UNABLE_TO_VERIFY_LEAF_SIGNATURE","path":"https://suraj-cluster.hcp.southcentralus.azmk8s.io/apis/apps/v1/namespaces/kube-system/deployments?labelSelector=app%3Dcluster-autoscaler","errno":0},"msg":"Error detecting cluster-autoscaler"}
│ {"level":"error","time":"2026-06-08T22:12:13.440Z","pid":38386,"hostname":"Users-MacBook-Pro.local","operation":"listPVCs","namespace":"dynamo-system","errorMessage":"unable to verify the first certificate","statusCode":500,"rawError":{"message":"unable to verify the first certificate","stack":"Error: unable to verify the first certificate\n    at async fetch (node-fetch:96:41)\n    at async withRetry (/Users/user/code/kubeairunway/backend/src/lib/retry.ts:96:20)\n    at async listPVCs (/Users/user/code/kubeairunway/backend/src/services/kubernetes.ts:2204:28)\n    at async <anonymous> (/Users/user/code/kubeairunway/backend/src/routes/deployments.ts:1053:44)\n    at async dispatch (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/compose.js:22:23)\n    at async <anonymous> (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/validator/validator.js:81:18)\n    at async dispatch (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/compose.js:22:23)\n    at async dispatch (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/compose.js:22:23)\n    at async <anonymous> (/Users/user/code/kubeairunway/backend/src/hono-app.ts:119:9)\n    at async dispatch (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/compose.js:22:23)"},"msg":"Kubernetes API error: unable to verify the first certificate"}
│ {"level":"error","time":"2026-06-08T22:12:13.441Z","pid":38386,"hostname":"Users-MacBook-Pro.local","error":{"status":500},"stack":"Error: Failed to list storage disks: unable to verify the first certificate\n    at <anonymous> (/Users/user/code/kubeairunway/backend/src/routes/deployments.ts:1061:17)\n    at async dispatch (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/compose.js:22:23)\n    at async <anonymous> (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/validator/validator.js:81:18)\n    at async dispatch (/Users/user/code/kubeairunway/node_modules/.bun/hono@4.11.9/node_modules/hono/dist/compose.js:22:23)\n    at processTicksAndRejections (native:7:39)","msg":"Error: Failed to list storage disks: unable to verify the first certificate"}
│ {"level":"info","time":"2026-06-08T22:12:21.390Z","pid":38386,"hostname":"Users-MacBook-Pro.local","method":"GET","url":"http://localhost:3001/api/installation/gpu-capacity","msg":"GET /api/installation/gpu-capacity"}
│ {"level":"info","time":"2026-06-08T22:12:21.402Z","pid":38386,"hostname":"Users-MacBook-Pro.local","method":"GET","url":"http://localhost:3001/api/cluster/status","msg":"GET /api/cluster/status"}
│ {"level":"error","time":"2026-06-08T22:12:21.543Z","pid":38386,"hostname":"Users-MacBook-Pro.local","error":{"code":"UNABLE_TO_VERIFY_LEAF_SIGNATURE","path":"https://suraj-cluster.hcp.southcentralus.azmk8s.io/api/v1/nodes","errno":0},"msg":"Error getting cluster GPU capacity"}
│ {"level":"info","time":"2026-06-08T22:12:51.561Z","pid":38386,"hostname":"Users-MacBook-Pro.local","method":"GET","url":"http://localhost:3001/api/installation/gpu-capacity","msg":"GET /api/installation/gpu-capacity"}
│ {"level":"info","time":"2026-06-08T22:12:51.577Z","pid":38386,"hostname":"Users-MacBook-Pro.local","method":"GET","url":"http://localhost:3001/api/cluster/status","msg":"GET /api/cluster/status"}
│ {"level":"error","time":"2026-06-08T22:12:51.720Z","pid":38386,"hostname":"Users-MacBook-Pro.local","error":{"code":"UNABLE_TO_VERIFY_LEAF_SIGNATURE","path":"https://suraj-cluster.hcp.southcentralus.azmk8s.io/api/v1/nodes","errno":0},"msg":"Error getting cluster GPU capacity"}

Now go to http://localhost:5173/ and you should see Disconnected state on the top right.

image

Now with this PR's changes you should see a connected state:

gh pr checkout 319
bun install
bun run dev
image

Checklist

  • My code follows the project's style guidelines
  • I have added tests that prove my fix/feature works
  • New and existing unit tests pass locally
  • I have run bun run lint
  • I have updated documentation if needed
  • My changes generate no new warnings

Additional Notes

  • Scope is limited to backend client construction (backend/src/); no public API, CRD, or frontend changes. package.json and bun.lock are untouched.
  • bun run lint is not ticked because it fails for a pre-existing, unrelated reason: ESLint v9+ requires a flat eslint.config.js, which is absent on this branch (the migration lives in a separate PR). It is not caused by these changes.
  • Addresses the GitHub Copilot review comment requesting unit tests for BunTlsHttpLibrary / makeApiClient.
  • Known Bun limitations (documented in code): kubeconfig pfx client certs and cluster.proxy-url are not honoured under Bun's fetch.

- Add `BunTlsHttpLibrary` and a `makeApiClient` helper in
  `kubeconfig.ts`. The SDK's default `IsomorphicFetchHttpLibrary`
  passes the kubeconfig CA via a Node.js `https.Agent`, which Bun's
  native `fetch` ignores — it only honours TLS material on the
  per-request `tls` option. This caused
  `UNABLE_TO_VERIFY_LEAF_SIGNATURE` on every request to clusters with
  a private CA (e.g. AKS). The subclass re-injects
  `ca`/`cert`/`key`/`rejectUnauthorized` via `tls`; auth headers are
  still applied upstream via `authMethods`.

- Route all client construction in `kubernetes.ts`, `auth.ts`,
  `autoscaler.ts`, `config.ts`, `registry.ts`, and `secrets.ts`
  through `makeApiClient(...)` instead of `kc.makeApiClient(...)`,
  making the helper the single source of truth for Bun-safe TLS.

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Copilot AI review requested due to automatic review settings June 8, 2026 20:51
@surajssd surajssd requested a review from a team as a code owner June 8, 2026 20:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Kubernetes API authentication failures that occur when the backend runs under Bun against clusters using a private CA (e.g., AKS). Bun's native fetch ignores the https.Agent that @kubernetes/client-node normally uses to supply TLS material, causing UNABLE_TO_VERIFY_LEAF_SIGNATURE errors. The fix introduces a custom HTTP library that re-injects TLS material through Bun's per-request tls option.

Changes:

  • Adds BunTlsHttpLibrary (extends IsomorphicFetchHttpLibrary) that extracts kubeconfig TLS material via applyToHTTPSOptions and passes it through the Bun-native tls fetch option, mirroring the existing proxyServiceRequest pattern.
  • Adds a makeApiClient(kc, ApiClass) factory function as a drop-in replacement for kc.makeApiClient(...) that wires the custom HTTP library into the SDK configuration.
  • Migrates all backend service client construction to use the new helper.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
backend/src/lib/kubeconfig.ts Adds BunTlsHttpLibrary class and makeApiClient helper function
backend/src/services/auth.ts Switches to makeApiClient for AuthenticationV1Api
backend/src/services/autoscaler.ts Switches to makeApiClient for CoreV1Api, AppsV1Api, CustomObjectsApi
backend/src/services/config.ts Switches to makeApiClient for CoreV1Api
backend/src/services/kubernetes.ts Switches to makeApiClient for all API clients including per-user token clients
backend/src/services/registry.ts Switches to makeApiClient for CoreV1Api, AppsV1Api
backend/src/services/secrets.ts Switches to makeApiClient for CoreV1Api

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/src/lib/kubeconfig.ts
The TLS-extraction logic was duplicated across two call sites and
dropped some kubeconfig fields.

- Extract a shared `kubeConfigToBunTls()` helper as the single source of
  truth; route both `BunTlsHttpLibrary.send()` and
  `proxyServiceRequest()` through it so the two TLS paths can no longer
  drift.
- Forward the SNI override, mapping the Node-side `servername` to Bun
  camelCase `serverName` (previously dropped, breaking clusters that set
  `tls-server-name`).
- Forward the client-key `passphrase`, previously dropped.
- Resolve the TLS material once per client and cache it, so
  `applyToHTTPSOptions` no longer re-runs the auth/cert pipeline (e.g.
  exec credential plugins) on every request.
- Document `pfx` and `cluster.proxy-url` as known Bun limitations, note
  the `@kubernetes/client-node@1.4.0` SDK coupling, and flag the
  multipart-body caveat.
- Add `kubeconfig.test.ts` regression tests, including a guard that the
  default path never disables certificate verification.

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

backend/src/lib/kubeconfig.ts #	modified:
backend/src/services/kubernetes.ts #
PR-319-review-report.md #	PR-319-reviews/ #
@robert-cronin robert-cronin merged commit ab4f152 into kaito-project:main Jun 9, 2026
11 checks passed
@surajssd surajssd deleted the suraj/fix-frontend-authn-issues branch June 9, 2026 15:20
@surajssd surajssd added this to the 0.7.0 milestone Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants