fix(kaito): detect KAITO installed via AKS AI-toolchain add-on#322
Merged
robert-cronin merged 4 commits intoJun 11, 2026
Merged
Conversation
- Add `.playwright-mcp/` to ignore Playwright MCP artifacts - Add missing trailing newline at end of file Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
KAITO installed through the AKS AI-toolchain-operator add-on runs in `kube-system` labeled `app=ai-toolchain-operator`, so neither detection path matched it and the UI reported KAITO as "Not Installed". - backend probe: add `app=ai-toolchain-operator` to the KAITO fallback pod selectors so the cross-namespace search finds the operator pod in `kube-system` - provider shim: broaden `listWorkspaceController` to match both `app.kubernetes.io/name=workspace` and `=ai-toolchain-operator` via a set-based selector - provider shim: reword `controllerMissingUserMessage` to suggest enabling the AKS add-on instead of disabling it - add backend and Go tests covering the AKS add-on install Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where KAITO installed via the AKS AI-toolchain-operator add-on (az aks ... --enable-ai-toolchain-operator) was not detected by either the web backend probe or the Go provider shim. The add-on uses different labels (app=ai-toolchain-operator / app.kubernetes.io/name=ai-toolchain-operator) and runs in kube-system rather than kaito-workspace, so both detection paths needed to be updated.
Changes:
- Extended both the TypeScript backend pod probe (cross-namespace fallback) and the Go provider shim's Deployment selector to recognize the AKS add-on labels alongside the upstream Helm chart labels.
- Updated the user-facing error message to present enabling the AKS add-on as a valid install path rather than instructing users to disable it.
- Added tests covering the AKS add-on detection path in both the TypeScript and Go test suites.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
providers/kaito/upstream_health.go |
Broadened listWorkspaceController to use a set-based label selector (In operator) matching both workspace and ai-toolchain-operator values; updated the user-facing error message. |
providers/kaito/upstream_health_test.go |
Added newAKSAddonDeployment fixture and TestProbe_ControllerReady_AKSAddon test validating detection of the add-on deployment in kube-system. |
backend/src/services/kubernetes.ts |
Defined KAITO_AKS_ADDON_POD_SELECTOR (app=ai-toolchain-operator) and added it to the KAITO probe's fallbackPodSelectors for cross-namespace discovery. |
backend/src/services/kubernetes-runtime-status.test.ts |
Added test verifying KAITO is reported as installed when only the AKS add-on pod is running in kube-system. |
.gitignore |
Added .playwright-mcp/ entry and ensured trailing newline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Follow-up to the AKS AI-toolchain-operator detection fix, addressing code-review feedback after verifying the add-on labels on a live cluster. - backend: move `app=ai-toolchain-operator` from the shared `fallbackPodSelectors` into an explicit `crossNamespaceFallbackPodSelectors` for KAITO, so add-on detection no longer relies on an implicit default and avoids a guaranteed-empty query against `kaito-workspace` - document the verified label asymmetry in both paths with cross-references: the add-on Pod carries only `app`, while its Deployment carries both `app` and `app.kubernetes.io/name`, so the TS pod probe and Go Deployment probe match different keys on purpose - reword `controllerMissingUserMessage` to cover both the Helm and AKS add-on install paths, pointing at the namespace to inspect for each - test: stamp `newAKSAddonDeployment` with the real observed label set (both keys) and add `TestProbe_ControllerNotReady_AKSAddon` Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
robert-cronin
approved these changes
Jun 11, 2026
robert-cronin
left a comment
Member
There was a problem hiding this comment.
verified on a fresh aks cluster with --enable-ai-toolchain-operator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
KAITO installed via the AKS AI-toolchain-operator add-on (
az aks ... --enable-ai-toolchain-operator) was reported as "Not Installed" in the AI Runway UI, even though the operator was running and healthy. The add-on runs the KAITO controller in thekube-systemnamespace labeledapp=ai-toolchain-operator/app.kubernetes.io/name=ai-toolchain-operator, but both detection paths only matched the upstream Helm chart'sapp.kubernetes.io/name=workspacelabel (expected in thekaito-workspacenamespace), so neither found the operator.This PR teaches both detection paths to recognize the AKS add-on install in addition to the upstream Helm install.
Type of Change
Related Issues
Fixes #318
Relates to #179
Changes Made
backend/src/services/kubernetes.ts): added aKAITO_AKS_ADDON_POD_SELECTOR(app=ai-toolchain-operator) and appended it to the KAITO probe'sfallbackPodSelectorsinRUNTIME_INSTALLATION_PROBES. The existing cross-namespace fallback infindReadyOperatorPodnow finds the add-on operator pod inkube-system, sooperatorRunning(and thereforeinstalled) becomestrue.providers/kaito/upstream_health.go): broadenedlistWorkspaceControllerto match bothapp.kubernetes.io/name=workspaceandapp.kubernetes.io/name=ai-toolchain-operatorusing a set-based (selection.In) label selector viaclient.MatchingLabelsSelector. The list stays cluster-wide, so the controller is detected whether it runs inkaito-workspace(chart) orkube-system(add-on). This keepsInferenceProviderConfig.status.ready=trueso provider auto-selection still picks KAITO at deploy time.providers/kaito/upstream_health.go): rewordedcontrollerMissingUserMessageto present enabling the AKS add-on (az aks update --enable-ai-toolchain-operator ...) as a valid install path, instead of instructing users to disable it.reports KAITO as installed when the AKS AI-toolchain-operator add-on pod is running in kube-systeminkubernetes-runtime-status.test.ts, andTestProbe_ControllerReady_AKSAddon(plus anewAKSAddonDeploymentfixture) inupstream_health_test.go..gitignore): ignore.playwright-mcp/artifacts and add a trailing newline.Testing
bun run test)Backend:
bun test src/services/kubernetes-runtime-status.test.ts→ 17 pass. Go:go build ./...clean andgo test ./...pass (including the newTestProbe_ControllerReady_AKSAddon);gofmtclean. On a live AKS cluster with--enable-ai-toolchain-operator,/api/runtimes/statusnow returnsinstalled: true, operatorRunning: true, healthy: truefor KAITO, and the deploy page (verified via Playwright) shows the green "Installed" badge with the Deploy Model button enabled.Checklist
bun run lintScreenshots
Before: KAITO runtime card shows "Not Installed" with an "Install KAITO before deploying" warning and a disabled "Runtime Not Installed" button.
After: KAITO runtime card shows the green "Installed" badge, the warning is gone, and the "Deploy Model" button is enabled.
Additional Notes
InferenceProviderConfigCR reportsstatus.ready=true— without it, the controller's provider auto-selection could still skip KAITO at actual deploy time. Picking up the shim change requires rebuilding and redeploying theairunway-kaito-providerimage.Listwas already cluster-scoped and authorized, and the web backend already performs an all-namespaces pod search.