Stop assigning Azure AI User role to per-agent managed identity after deploy#8941
Conversation
The Foundry service now grants each hosted agent's per-agent managed identity its required permissions internally, so the client-side Azure AI User role assignment in the post-deploy handler was redundant. It also failed noisily when the deploying user lacked roleAssignments/write, blocking deploys for users holding only data-plane roles. Remove the post-deploy role-assignment path and the now-false remote.agent-identity-roles doctor check, which enumerated ARM role assignments the service no longer creates and therefore reported false failures. Fixes #8940 Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
7021e1c to
b1cf34f
Compare
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
There was a problem hiding this comment.
Pull request overview
This PR removes the redundant client-side assignment of the Azure AI User role to each hosted agent's per-agent managed identity in the azure.ai.agents extension's post-deploy step. The Microsoft Foundry service now grants the per-agent identity its required permissions internally, so the client-side assignment was both unnecessary and harmful — it failed noisily (blocking otherwise-successful deploys) when the deploying user lacked Microsoft.Authorization/roleAssignments/write. It also removes the now-always-false remote.agent-identity-roles doctor check that enumerated ARM role assignments the service no longer creates.
Changes:
- Dropped the post-deploy RBAC block in
listen.go(fetching agent version +EnsureAgentIdentityRBAC), keeping endpoint/credential setup for optimization reporting. - Removed
EnsureAgentIdentityRBAC/verifyRoleAssignmentand the entireagent_identity_query.go, while preserving shared helpers (parseAgentIdentityInfo,assignRoleToIdentity,extractSubscriptionID,roleAzureAIUser) still used by the developer pre-flight RBAC check. - Removed the
remote.agent-identity-rolesdoctor check and its wiring, updating the contract test (6 → 5 checks), the remediation switch, test seams, and stale doc comments.
I verified via repo-wide search that no references to the removed symbols (EnsureAgentIdentityRBAC, QueryAgentIdentityRoles, newCheckAgentIdentityRoles, probeAgentPrincipal, queryAgentIdentityRoles) remain, that all kept helpers/struct fields are still consumed, and that the agent_api import was correctly dropped from listen.go while toServiceKey remains used elsewhere.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
internal/cmd/listen.go |
Removed post-deploy agent-identity RBAC block and unused agent_api import; retained endpoint/credential setup for optimization reporting |
internal/project/agent_identity_rbac.go |
Removed EnsureAgentIdentityRBAC, ensureAgentIdentityRBACWithCred, ensureSingleAgentRBAC, verifyRoleAssignment, and now-unused imports/consts; kept shared helpers |
internal/project/agent_identity_query.go |
Deleted entirely (read-side role enumeration no longer needed) |
internal/cmd/doctor/checks_agent_identity_roles.go |
Deleted the remote.agent-identity-roles check implementation |
internal/cmd/doctor/checks_agent_identity_roles_test.go |
Deleted the corresponding tests |
internal/cmd/doctor/checks_remote.go |
Removed check registration; added explanatory note |
internal/cmd/doctor/checks_remote_test.go |
Updated contract test to expect 5 checks and new ordering |
internal/cmd/doctor/checks_local.go |
Removed probeAgentPrincipal / queryAgentIdentityRoles test seams from Dependencies |
internal/cmd/doctor/types.go |
Updated StatusInfo doc comment to drop the removed check as its example |
internal/cmd/doctor/shared_test.go |
Updated helper comment referencing the removed check |
internal/cmd/doctor_format.go |
Removed remote.agent-identity-roles from the remediation switch |
jongio
left a comment
There was a problem hiding this comment.
Clean removal of the redundant Azure AI User role assignment from the post-deploy path. The Foundry service handles these permissions internally now, so the client-side write (and the doctor check that enumerated them) are correctly removed.
Verified:
- Shared helpers (parseAgentIdentityInfo, �ssignRoleToIdentity, �xtractSubscriptionID,
oleAzureAIUser) are still used by the developer pre-flight RBAC check in developer_rbac_check.go - listen.go post-deploy flow remains correct: credential is still created for the optimization reporting that follows
- Remote checks contract test correctly updated (6 to 5 checks)
- No dead imports or unreferenced code left behind
- ACR role assignment behavior intentionally untouched (out of scope)
RickWinter
left a comment
There was a problem hiding this comment.
This drops the client-side Azure AI User role assignment for per-agent managed identities (the post-deploy step and the doctor remote.agent-identity-roles check), on the basis that Foundry now grants that permission server-side and the client assignment only produced noisy failures for data-plane-only users. The direction is right and the deletion is clean: the shared helpers (parseAgentIdentityInfo, assignRoleToIdentity, extractSubscriptionID, roleAzureAIUser) are correctly retained because developer_rbac_check.go still uses them, endpointResp and cred remain consumed by optimization reporting so nothing goes unused, and the remote-checks contract test is updated from 6 to 5.
One follow-up worth considering: with RBAC gone, the endpoint/tenant/credential resolution in postdeployHandler now feeds only best-effort optimization reporting, yet it still returns hard errors. That is a heavier failure mode than the remaining feature warrants. Not a blocker. The ACR scope note is correct to leave that path untouched.
…metry env Addresses PR #8941 review follow-up: with the client-side agent-identity RBAC assignment removed, the endpoint/tenant/credential resolution in postdeployHandler now feeds only best-effort optimization reporting (wrapped in recover, never propagated). Downgrade the FOUNDRY_PROJECT_ENDPOINT / AZURE_TENANT_ID / credential setup guards from hard errors to logged warnings + return nil, so missing optimization telemetry inputs can no longer fail an otherwise-successful deploy. Adds TestPostdeployHandler_MissingTelemetryEnv_ReturnsNil covering the missing/ empty endpoint and missing tenant cases.
Why
When deploying a hosted agent, the
azure.ai.agentsextension assigned theAzure AI Userrole to each hosted agent's per-agent managed identity in the post-deploy step. That client-side assignment is now redundant: the Microsoft Foundry service grants the per-agent identity its required permissions internally. Worse, it failed noisily when the deploying user lackedMicrosoft.Authorization/roleAssignments/write, blocking otherwise-successful deploys for users who only hold data-plane roles (for exampleFoundry User).This mirrors the equivalent change already made in the Foundry service tooling (microsoft/Skylight#4910).
What changed
listen.go): dropped the block that fetched the agent version, read its instance-identity principal ID, and calledEnsureAgentIdentityRBAC. The handler still sets up the project endpoint and credential and does optimization reporting.agent_identity_rbac.go): removedEnsureAgentIdentityRBACand its helpers. Kept the shared helpers (parseAgentIdentityInfo,assignRoleToIdentity,extractSubscriptionID,roleAzureAIUser) that are still used by the separate developer pre-flight RBAC check.remote.agent-identity-rolescheck (and its query moduleagent_identity_query.go). Because the service no longer creates ARM role assignments, that check enumerated nothing and folded every agent into a false aggregate failure. Updated the remote-checks contract test (6 -> 5 checks), the remediation switch, and stale doc comments.Scope note
This PR only removes the redundant Foundry
Azure AI Userassignment. ACR role assignment behavior (including ABAC-aware roles) is intentionally left untouched and out of scope.Verification
go build,go vet, and the fullgo test ./...suite pass for the extension.echo-dualdual-protocol hosted container agent against an existing Foundry project. The deploy produced no client-side RBAC banner, the agent became active and responded toazd ai agent invoke(responsesprotocol), andazd ai agent doctorno longer listsremote.agent-identity-roles.Fixes #8940