Skip to content

fix(kaito): detect KAITO installed via AKS AI-toolchain add-on#322

Merged
robert-cronin merged 4 commits into
kaito-project:mainfrom
surajssd:kaito-provider-not-identified
Jun 11, 2026
Merged

fix(kaito): detect KAITO installed via AKS AI-toolchain add-on#322
robert-cronin merged 4 commits into
kaito-project:mainfrom
surajssd:kaito-provider-not-identified

Conversation

@surajssd

@surajssd surajssd commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

KAITO installed via the AKS AI-toolchain-operator add-on (az aks ... --enable-ai-toolchain-operator) was reported as "Not Installed" in the AI Runway UI, even though the operator was running and healthy. The add-on runs the KAITO controller in the kube-system namespace labeled app=ai-toolchain-operator / app.kubernetes.io/name=ai-toolchain-operator, but both detection paths only matched the upstream Helm chart's app.kubernetes.io/name=workspace label (expected in the kaito-workspace namespace), so neither found the operator.

This PR teaches both detection paths to recognize the AKS add-on install in addition to the upstream Helm install.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 📚 Documentation update
  • 🎨 UI/UX improvement
  • ♻️ Refactoring (no functional changes)
  • 🧪 Test update
  • 🔧 Build/CI configuration

Related Issues

Fixes #318
Relates to #179

Changes Made

  • Web backend probe (backend/src/services/kubernetes.ts): added a KAITO_AKS_ADDON_POD_SELECTOR (app=ai-toolchain-operator) and appended it to the KAITO probe's fallbackPodSelectors in RUNTIME_INSTALLATION_PROBES. The existing cross-namespace fallback in findReadyOperatorPod now finds the add-on operator pod in kube-system, so operatorRunning (and therefore installed) becomes true.
  • Provider shim (providers/kaito/upstream_health.go): broadened listWorkspaceController to match both app.kubernetes.io/name=workspace and app.kubernetes.io/name=ai-toolchain-operator using a set-based (selection.In) label selector via client.MatchingLabelsSelector. The list stays cluster-wide, so the controller is detected whether it runs in kaito-workspace (chart) or kube-system (add-on). This keeps InferenceProviderConfig.status.ready=true so provider auto-selection still picks KAITO at deploy time.
  • User-facing message (providers/kaito/upstream_health.go): reworded controllerMissingUserMessage to present enabling the AKS add-on (az aks update --enable-ai-toolchain-operator ...) as a valid install path, instead of instructing users to disable it.
  • Tests: added reports KAITO as installed when the AKS AI-toolchain-operator add-on pod is running in kube-system in kubernetes-runtime-status.test.ts, and TestProbe_ControllerReady_AKSAddon (plus a newAKSAddonDeployment fixture) in upstream_health_test.go.
  • Chore (.gitignore): ignore .playwright-mcp/ artifacts and add a trailing newline.

Testing

  • Unit tests pass (bun run test)
  • Manual testing performed
  • Tested with a Kubernetes cluster

Backend: bun test src/services/kubernetes-runtime-status.test.ts17 pass. Go: go build ./... clean and go test ./... pass (including the new TestProbe_ControllerReady_AKSAddon); gofmt clean. On a live AKS cluster with --enable-ai-toolchain-operator, /api/runtimes/status now returns installed: true, operatorRunning: true, healthy: true for KAITO, and the deploy page (verified via Playwright) shows the green "Installed" badge with the Deploy Model button enabled.

Checklist

  • My code follows the project's style guidelines
  • I have run bun run lint
  • I have added tests that prove my fix/feature works
  • New and existing unit tests pass locally
  • I have updated documentation if needed
  • My changes generate no new warnings

Screenshots

Before: KAITO runtime card shows "Not Installed" with an "Install KAITO before deploying" warning and a disabled "Runtime Not Installed" button.

image

After: KAITO runtime card shows the green "Installed" badge, the warning is gone, and the "Deploy Model" button is enabled.

image

Additional Notes

  • The fix has two independent halves. The backend probe change alone corrects the visible badge and unblocks the Deploy button (web backend hot-reloads). The provider shim change is required so the InferenceProviderConfig CR reports status.ready=true — without it, the controller's provider auto-selection could still skip KAITO at actual deploy time. Picking up the shim change requires rebuilding and redeploying the airunway-kaito-provider image.
  • No RBAC changes were needed: the shim's List was already cluster-scoped and authorized, and the web backend already performs an all-namespaces pod search.

surajssd added 2 commits June 9, 2026 14:21
- Add `.playwright-mcp/` to ignore Playwright MCP artifacts
- Add missing trailing newline at end of file

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
KAITO installed through the AKS AI-toolchain-operator add-on runs in
`kube-system` labeled `app=ai-toolchain-operator`, so neither detection
path matched it and the UI reported KAITO as "Not Installed".

- backend probe: add `app=ai-toolchain-operator` to the KAITO fallback
  pod selectors so the cross-namespace search finds the operator pod in
  `kube-system`
- provider shim: broaden `listWorkspaceController` to match both
  `app.kubernetes.io/name=workspace` and `=ai-toolchain-operator` via a
  set-based selector
- provider shim: reword `controllerMissingUserMessage` to suggest
  enabling the AKS add-on instead of disabling it
- add backend and Go tests covering the AKS add-on install

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
@surajssd surajssd requested a review from a team as a code owner June 9, 2026 21:27
Copilot AI review requested due to automatic review settings June 9, 2026 21:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where KAITO installed via the AKS AI-toolchain-operator add-on (az aks ... --enable-ai-toolchain-operator) was not detected by either the web backend probe or the Go provider shim. The add-on uses different labels (app=ai-toolchain-operator / app.kubernetes.io/name=ai-toolchain-operator) and runs in kube-system rather than kaito-workspace, so both detection paths needed to be updated.

Changes:

  • Extended both the TypeScript backend pod probe (cross-namespace fallback) and the Go provider shim's Deployment selector to recognize the AKS add-on labels alongside the upstream Helm chart labels.
  • Updated the user-facing error message to present enabling the AKS add-on as a valid install path rather than instructing users to disable it.
  • Added tests covering the AKS add-on detection path in both the TypeScript and Go test suites.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
providers/kaito/upstream_health.go Broadened listWorkspaceController to use a set-based label selector (In operator) matching both workspace and ai-toolchain-operator values; updated the user-facing error message.
providers/kaito/upstream_health_test.go Added newAKSAddonDeployment fixture and TestProbe_ControllerReady_AKSAddon test validating detection of the add-on deployment in kube-system.
backend/src/services/kubernetes.ts Defined KAITO_AKS_ADDON_POD_SELECTOR (app=ai-toolchain-operator) and added it to the KAITO probe's fallbackPodSelectors for cross-namespace discovery.
backend/src/services/kubernetes-runtime-status.test.ts Added test verifying KAITO is reported as installed when only the AKS add-on pod is running in kube-system.
.gitignore Added .playwright-mcp/ entry and ensured trailing newline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread providers/kaito/upstream_health.go
Follow-up to the AKS AI-toolchain-operator detection fix, addressing
code-review feedback after verifying the add-on labels on a live
cluster.

- backend: move `app=ai-toolchain-operator` from the shared
  `fallbackPodSelectors` into an explicit
  `crossNamespaceFallbackPodSelectors` for KAITO, so add-on detection no
  longer relies on an implicit default and avoids a guaranteed-empty
  query against `kaito-workspace`
- document the verified label asymmetry in both paths with
  cross-references: the add-on Pod carries only `app`, while its
  Deployment carries both `app` and `app.kubernetes.io/name`, so the TS
  pod probe and Go Deployment probe match different keys on purpose
- reword `controllerMissingUserMessage` to cover both the Helm and AKS
  add-on install paths, pointing at the namespace to inspect for each
- test: stamp `newAKSAddonDeployment` with the real observed label set
  (both keys) and add `TestProbe_ControllerNotReady_AKSAddon`

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
@surajssd surajssd requested a review from robert-cronin June 10, 2026 22:14

@robert-cronin robert-cronin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified on a fresh aks cluster with --enable-ai-toolchain-operator

Copilot AI review requested due to automatic review settings June 11, 2026 00:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

@robert-cronin robert-cronin merged commit aa14a55 into kaito-project:main Jun 11, 2026
11 checks passed
@surajssd surajssd deleted the kaito-provider-not-identified branch June 11, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: kaito provider still not identified when installed via AKS --enable-ai-toolchain-operator

3 participants