feat: generalize $ref resolver + shared $ref-aware azure.yaml edit helper by huimiu · Pull Request #8777 · Azure/azure-dev

huimiu · 2026-06-23T04:05:42Z

Part of #8775 · Design spec PR #8590 (docs/specs/unify-azure-yaml/spec.md §2.4).

Base branch is huimiu/foundry-azure-yaml (the Foundry azure.yaml integration branch). This PR is intentionally independent of #8675.

What

PR3 of the "unify Foundry config in azure.yaml" follow-up.

Resolver generalization (`includes.go` — docs + tests only)

The $ref resolver from #8627 already resolves $ref at any map/sequence node, so the separate-services shapes need no new resolution logic — only a generalized documented contract and coverage:

Service-entry-level $ref (top-level inline map, beside host: / the service key, which core strips).
Deployment array-item $ref on the project service.
Per-file rebasing of project / instructions / nested $ref (already implemented).

Shared `$ref`-aware YAML edit helper (`includes_edit.go` — new)

A comment-preserving writer that the #8049 composition commands also use, so reads and writes of $ref entries agree:

YAMLDocument load / parse / save / bytes — braydonk round-trip with two-space indent and block-scalar style, matching how azd core writes azure.yaml.
ServiceEntry(name, create), EntryRef(entry), SetServiceField(name, key, value, EditTarget).
EditInline overlays a key beside $ref (the §2.4 default); EditRefFile follows the $ref into the split file (path resolved with the resolver's shared logic), falling back to inline when the entry is not $ref-backed.
Reuses core pkg/yamlnode for node ops and the resolver's refKey / refTargetPath, so the writer and resolver agree on what a $ref entry is and where it points.

Out of scope

Wiring the resolver into each provider's parse path (rides with PR2), remote URL $ref fetching, and JSON-Schema validation of loaded entries. No azd-core changes; no dependency on #8675.

Tests

go test ./internal/project/...

Resolver shapes: service-entry top-level $ref (+ overlay), project deployments: array-item $ref (+ inline item, + overlay).
Edit helper: round-trip comment/order preservation, find / create / missing, $ref detection, inline overlay + read-back agreement via ResolveFileRefs, in-place update (comment preserved), EditRefFile split-file write, non-$ref fallback, and missing ref-file error.

gofmt -s, golangci-lint, and cspell are clean.

Copilot

Pull request overview

This PR extends the existing $ref include resolver contract to explicitly cover the “separate-services” azure.yaml shape, and adds a shared $ref-aware, comment-preserving YAML edit helper so read/write behavior stays consistent across Foundry-related commands and extensions.

Changes:

Document the generalized ResolveFileRefs contract to cover service-entry top-level $ref, deployments array-item $ref, and nested $ref recursion.
Add resolver test coverage for the new documented shapes (including shallow sibling overlay behavior).
Introduce includes_edit.go (YAMLDocument) for comment/order-preserving edits, with $ref-aware inline-vs-ref-file targeting and corresponding tests.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
cli/azd/extensions/azure.ai.agents/internal/project/includes.go	Updates resolver documentation to describe separate-services `$ref` shapes and contract.
cli/azd/extensions/azure.ai.agents/internal/project/includes_test.go	Adds tests for service-entry-level and deployments array-item `$ref` resolution + overlays.
cli/azd/extensions/azure.ai.agents/internal/project/includes_edit.go	Adds a `$ref`-aware YAML editing helper that preserves comments/key order and can write inline or into split files.
cli/azd/extensions/azure.ai.agents/internal/project/includes_edit_test.go	Adds tests for round-trip stability and for inline/ref-file edit behavior aligning with `ResolveFileRefs`.

…and IaC-less init (#8818) * feat(agents): add microsoft.foundry azure.yaml schema (#8603) * feat(agents): add microsoft.foundry azure.yaml schema Schema-only scaffolding for unifying Microsoft Foundry agent config in azure.yaml (design spec PR #8590, section 2.3). - Add the host: microsoft.foundry conditional to schemas/v1.0 and schemas/alpha azure.yaml.json, composing the extension schema at the service level via allOf and turning off project/runtime/docker/image/config. - Add microsoft.foundry to the host examples list. - Publish the Foundry extension schemas under cli/azd/extensions/azure.ai.agents/schemas/: microsoft.foundry.json plus per-resource files (Agent, Skill, Routine, Connection, Toolbox, Deployment, FileRef), ported from the PM preview repo with $id rewritten to the azure-dev path and relative $refs preserved. - microsoft.foundry.json uses additionalProperties: true at the project level (deliberate deviation from the preview's false) so future Foundry resource types do not break the schema, per the brief and design spec section 2.3. Authoring-only: no service-target wiring, provider registration, or alpha-feature gating (those are later PRs). * fix(agents): relax PromptAgent to accept skill without instructions PromptAgent now requires name + kind plus at least one of instructions or skill (anyOf), instead of always requiring instructions. A prompt agent backed by a skill (which supplies the instructions) no longer fails schema validation. This fixes the complex sample's summarizer-agent validation failure. * fix(agents): enforce Foundry schema dependencies Require project when hosted agents define docker or runtime settings, and enforce routine trigger-specific required fields for schedule, webhook, and event triggers. * fix(agents): align Foundry schema constraints Use conditional schema constraints for hosted-agent project requirements and routine trigger-specific fields. * feat: target foundry azure.yaml schema at integration branch with validatable samples * fix: normalize foundry azure.yaml schema URLs to short raw.githubusercontent form * feat(agents): resolve $ref file includes with overlay overrides in Foundry config (#8627) * feat: resolve $ref file includes with overlay overrides in Foundry agents config * fix: modernize $ref cycle check with slices and trim whitespace in include paths * feat(agents): add --image flag to agent init command (#8689) * feat(agents): add --image flag to agent init command Add --image flag to `azd ai agent init` to allow users to specify a pre-built container image directly during init. When provided: - Writes the image URL to the `image` field in agent.yaml - Skips ACR connection prompts (user manages their own registry) - Validates the image URL contains registry/image format - Returns error if combined with --deploy-mode code This enables non-interactive init workflows for VNet scenarios where users bring their own private ACR with pre-built images. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(agents): short-circuit template/language selection when --image is set When --image is provided without --manifest, synthesize a minimal hosted container manifest and route through the manifest flow, skipping the init-mode/template/language prompts and code scaffolding (a pre-built image has no source to scaffold). Requires --agent-name; validates the image early before any project/template init. The image is written to agent.yaml and ACR is skipped via the existing --image handling. * fix(agents): skip ACR connection prompt for --image in interactive mode The interactive ACR connection discovery/prompt (configureAcrConnection via configureFoundryProjectEnv) was gated on a.isCodeDeploy, which is false for a pre-built --image (container deploy). Pass a.skipACR() to the four selectFoundryProject call sites so an image-based hosted agent skips ACR setup in interactive mode too, matching the AZD_AGENT_SKIP_ACR env behavior. The hosted-agent region filter now also applies to --image, which is correct since it runs as a hosted agent. * fix(agents): skip startup-command prompt for pre-built image agents A pre-built container image runs its own entrypoint, so no startup command applies. Skip resolveStartupCommandForInit when the agent template references a pre-built image (agentUsesPreBuiltImage), covering both --image (synthesized manifest) and a -m manifest that already specifies an image. Without this, the container-deploy path prompted "Enter the command to start your agent" because the image src dir has no main.py to auto-detect. * fix(agents): persist AZD_AGENT_SKIP_ACR in the no-model-resources path configureModelChoice only wrote AZD_AGENT_SKIP_ACR in the deferred-headless and main model-config paths. A completing no-model-resources flow (e.g. a pre-built --image agent selected interactively) returned without setting it, so provision would still create an ACR despite --image. Set it via skipACR() before returning, matching the other two call sites. * fix(agents): make --image deployments use the pre-built image end-to-end Real init/provision/deploy validation showed that --image init correctly wrote image: and AZD_AGENT_SKIP_ACR=true, but deploy still defaulted to the Dockerfile build path in --no-prompt mode. Treat AZD_AGENT_SKIP_ACR=true as an explicit BYO-image signal for hosted container agents so Package returns a remote pre-built image artifact and Publish is skipped. The same E2E also found the synthesized manifest used responses protocol version v1, while invoke requires 1.0.0. Generate 1.0.0 for --image manifests. * fix(agents): tighten --image reference validation Replace the slash-only validation with a stricter fully-qualified image reference regex that requires an explicit registry host, repository path, and optional tag or sha256 digest. Add coverage for digest, localhost/port registry, and malformed refs such as URL schemes, namespace-only refs, missing repository, short digest, and uppercase repository names. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(agents): bicep-less init for azd ai agent init (#8643) * feat(agents): bicep-less init for azd ai agent init `azd ai agent init` now produces a project without an on-disk `infra/` directory and stamps `infra.provider: microsoft.foundry` on azure.yaml so provisioning is routed to this extension's provider. Init UX changes: - After the starter scaffold, the `infra/` tree is removed. - `azure.yaml` gets `infra.provider: microsoft.foundry` written and `infra.path: ./infra` removed. - ACR and Application Insights connection prompts are skipped during init; the extension's provisioning provider owns those resources. - The "no infra/ directory" warning for existing projects is suppressed when the provider is already declared. Core changes: - Relax the `infra.provider` JSON Schema (v1.0 and alpha) from a fixed enum to `pattern: "^[a-z0-9.]+$"` + `examples`, keeping typo catching while allowing extension-registered providers. New code: - internal/project/provisioning_provider.go: `FoundryProviderName` constant, single source of truth. - internal/cmd/init.go: `writeFoundryProvider`, `hasFoundryProviderDeclared` helpers. - internal/cmd/init_foundry_resources_helpers.go: `bicepless` parameter on `configureFoundryProjectEnv` and `selectFoundryProject` that short-circuits connection discovery. Tests: - TestHasFoundryProviderDeclared covers the warning-suppression predicate. - TestConfigureFoundryProjectEnv_BicepLessShortCircuits verifies the short-circuit fires before any Foundry data-plane call (uses a nil credential to make a regression crash loudly). * feat(agents): microsoft.foundry provisioning provider Adds an in-memory Bicep synthesizer and a provisioning provider that implements the full azdext provisioning gRPC surface for projects declaring `infra.provider: microsoft.foundry`. The provider generates, compiles, and deploys an ARM template on demand without writing Bicep to disk. Synthesizer (internal/synthesis): - Embeds the templates/main.bicep + modules/acr.bicep and the precompiled main.arm.json via `go embed`. - `Synthesize(Input)` returns the parameterized ARM JSON plus a manifest describing the synthesized resources. - `AcceptedHosts` input controls which agent host kinds the synthesizer will materialize (currently `azure.ai.agent`). Provider (internal/project): - `FoundryProvisioningProvider` implements all 9 azdext provisioning methods (Initialize, State, Deploy, Destroy, Preview, Parameters, Outputs, Resources, GetDeployment). - `ensureResourceGroup` creates the RG on demand. - Synthesizes from agent.yaml + host shapes, threads outputs back through the env, surfaces typed errors via exterrors. Service-target integration: - `AgentServiceTargetProvider.Initialize` is split into a cheap stub and lazy `ensureDeployContext` / `ensureEnv` calls so registration no longer requires a deploy context. - All 5 deploy-time entry points (Package, Publish, Deploy, Endpoints, GetTargetResource) gate on the lazy helpers. Registration: - `extension.yaml` declares the `provisioning-provider` capability with `microsoft.foundry`. - `internal/cmd/listen.go` registers `WithProvisioningProvider(FoundryProviderName, ...)`. Errors: - `internal/exterrors/codes.go` adds provisioning op names and error codes used by the provider. * feat(agents): add `--infra` eject flag to `azd ai agent init` Implements the bicepless-foundry spec's eject command: `azd ai agent init --infra` writes the synthesized Bicep templates from azure.yaml to ./infra/ on disk. Behavior follows RFC #8065 and the spec (spec/bicepless-foundry/spec.md §Eject Command): | Trigger | Behavior | | --------------------------------------------- | ------------------------------------------------------------------------------------------- | | empty dir + --infra | Run normal init flow, then write ./infra/ | | existing foundry project, no ./infra/, --infra | Standalone eject: write ./infra/; no init prompts, no agent-code change, azure.yaml unmutated | | ./infra/ already exists + --infra | Refuse with CodeInfraEjectExists, suggest delete + retry | | not an azd agent project + --infra | Refuse with CodeInfraEjectNoFoundryService | New code: - internal/cmd/init_infra.go: `ejectInfra` walks `synthesis.TemplatesFS()` to ./infra/, writes `main.parameters.json` from the synthesizer's parameter map, and prints the spec's success block. Atomic cleanup of partial writes on error. - internal/cmd/init.go: `--infra` flag plus top-of-RunE short-circuit for the standalone-eject case and a post-init hook for the empty-dir case. The standalone branch validates that no init-driving inputs (-m, --src, positional arg) were also passed; honoring them would silently drop the user's argument. The post-init chain skips silently when init returned without producing a foundry service so the user never sees "nothing to eject" after init succeeded. Provider lifecycle hardening (reviewer findings, all critical/major): - `FoundryProvisioningProvider.Initialize` is now cheap: tenant lookup and credential construction moved into a lazy `ensureCredential` helper called on-demand by `deploymentsClient` / `ensureResourceGroup`. azd-core may call Initialize on providers it never deploys with (env refresh, multi-provider projects); making it cheap avoids needless RPCs and lets metadata-only calls (Parameters, PlannedOutputs) succeed without auth. - `Destroy` no longer silently leaks resources. With `Force=true` it deletes the entire resource group (the previous behavior only removed the deployment record). Without `--force` it returns a structured `CodeDestroyRequiresForce` naming the RG and pointing at `azd down --force`. - `armOutputsToProto` and `armInputsToProto` JSON-encode non-string values via a shared `encodeParamValue` helper. Prior behavior collapsed arrays/objects via Go's default %v formatter (`["a","b"]` -> `[a b]`) which is unparseable downstream. - `Parameters` and `armParameters` are nil-safe on `synthResult` so a programming error elsewhere returns a structured Internal error instead of panicking. - `Preview` returns `Compatibility` (feature not implemented in this version) instead of `Validation`; the user supplied no invalid input. - `findFoundryService` and its eject equivalent return `Dependency` instead of `Validation` for missing-service errors; a missing service is a missing dependency, not malformed input. Multiple-services error split into its own `CodeInfraEjectMultipleFoundryServices` so telemetry can differentiate "none" from "too many". Tests (all passing under `go test ./... -short`): - `init_infra_test.go`: 10 test cases covering refusals (azure.yaml missing, ./infra/ exists, ./infra/ is a file, no foundry service, multiple foundry services, conflicting args with -m/--src/positional) and happy paths (file shape, parameters shape, no-docker omits ACR param). - `foundry_provisioning_provider_test.go`: `TestEncodeParamValue` (10 sub-cases for the JSON-encode helper), `TestArmOutputsToProto_JSONEncodesNonStrings`, `TestArmInputsToProto_JSONEncodesNonStrings`, `TestParameters_NilSafeOnMissingSynthResult`, `TestArmParameters_NilSafeOnMissingSynthResult`, `TestDestroy_RefusesWithoutForce`, `TestFindFoundryService_DependencyCategory`, `TestPreview_NotImplemented` updated to assert `Compatibility` category. End-to-end verified via `azd x build` + manual smoke tests against the installed extension: standalone eject succeeds, three conflicting- args combinations refuse cleanly with the structured error, refuse- when-./infra/-exists and refuse-when-multiple-services produce the expected output. Deliberate spec deviations (documented elsewhere, deferred): - Spec example lists `infra/modules/foundry-project.bicep`; our `main.bicep` is monolithic so the eject writes only the modules that actually exist (`main.bicep`, `modules/acr.bicep`, `abbreviations.json`, `main.parameters.json`). - Spec uses `--force` to overwrite an existing ./infra/; we follow the spec's later guidance to ask the user to delete and re-run. - Provider's on-disk reuse path (`Deploy` reads ./infra/main.bicep when present) is not implemented; the provider always uses the embedded ARM JSON. `--infra` is therefore inspection-only today; edits to the ejected `main.bicep` are not honored on the next `azd provision`. Tracked as separate work. - Telemetry fields `init.infra_flag` and `provision.synthesis_source` not yet emitted. * feat(agents): on-disk Bicep path + Preview via ARM what-if Adds support for projects that have an on-disk `./infra/` directory (typically the output of `azd ai agent init --infra`), and implements the provisioning Preview operation via ARM what-if for both inline and on-disk paths. On-disk path: - internal/project/ondisk_template.go detects `./infra/main.bicep`, compiles it via the bicep CLI, and feeds the resulting ARM JSON through the same deploy + outputs pipeline as the inline path. - `Parameters()` returns host-derived values regardless of source. Preview: - internal/project/preview_helpers.go renders an ARM what-if change summary (Create / Modify / Delete / NoChange / Ignore). - `Preview()` calls `ensureResourceGroup` first so what-if has a scope to query against. Robustness: - preprovision tolerates a missing agent.yaml on the inline-agents path (the synthesizer can still run from host shapes alone). - Deployment output names are canonicalized case-insensitively so ARM output lookups survive casing drift in the template. * chore(agents): polish: lint, drop starter clone, RG fix, comment trim Cross-cutting cleanup pass after the provider feature stabilized. - Satisfy golangci-lint and cspell on the bicepless feature. - Drop the `Azure-Samples/azd-ai-starter-basic` clone from init. `azd ai agent init` now invokes `azd init -t <emptyStagedDir> <targetDir>` to scaffold the project shape directly. Net -474 LoC. - Ensure the resource group exists before ARM what-if runs in Preview. - Trim verbose design-commentary from the bicepless code; comments now describe what code does and why, not the historical rationale. * fix(agents): purge soft-deleted Cognitive Services accounts on `azd down --force --purge` Foundry's CognitiveServices (AIServices) accounts soft-delete on RG removal and reserve the name for ~48h. The provider previously ignored options.GetPurge(), so the next `azd provision` failed with InvalidTemplateDeployment / FlagMustBeSetForRestore. Destroy now mirrors azd-core's built-in bicep provider purge flow: enumerate live accounts in the RG before BeginDelete (capturing name + location), delete the RG (soft-deleting the accounts), then purge each via DeletedAccountsClient.BeginPurge. - Hard-fails on purge / enumeration errors (silent skip would just reproduce the same bug on the next provision). - Skips the purge path when the RG is already gone at Destroy time (idempotent re-run). Documented in the Destroy docstring; users with a leftover from a prior incomplete cleanup can purge it manually via `az cognitiveservices account purge`. - Without --purge, the account remains soft-deleted (auto-expires per Azure retention) - unchanged behavior from prior commits. New OpCognitiveAccountList / OpCognitiveAccountPurge op-names give telemetry a way to distinguish the enumeration vs the purge step. Extracts collectPurgeableAccounts as a pure helper for unit testing; the SDK pager + poller calls stay inline in Destroy (consistent with the rest of the provider's untestable Azure SDK call sites). * fix(agents): address PR review feedback on bicepless feature - foundry_provisioning_provider.go: thread the caller's context into envValues so the Environment().GetValues gRPC call honors cancellation/timeouts instead of using context.Background(). - init_infra.go: correct the ejectInfra doc comment; it claimed infra.provider "stays azure.ai.agents" but the provider is microsoft.foundry. Now states the declared provider is left unchanged. - init.go: the missing-infra warning pointed users at running `azd ai agent init` in an empty directory, which no longer produces an infra/ directory now that init is bicep-less. Point to `azd ai agent init --infra` instead. * feat(provisioning): use subscription-scoped template for preview parity Switch to subscription-scoped ARM deployment so `azd provision --preview` runs what-if without pre-creating the resource group. The RG now appears in the what-if output as a Create operation, matching built-in bicep provider behavior. * feat(provisioning): render extension preview changes and stream progress Add a structured changes list to the provisioning preview proto and map it into the core preview renderer so `azd provision --preview` shows a colored resource diff for extension providers (previously only a string summary was sent and dropped). Route Deploy/Preview/Destroy progress to the console spinner via an injected console instead of debug-only logs, finishing the existing TODOs. * fix(agents): emit preview changes, clean up model deployments on down, fix ACR connection Preview now returns structured what-if changes so the core renderer shows a colored resource diff. Destroy deletes model deployments before the resource group so the Cognitive Services account delete no longer rolls back with CannotDeleteAccountWithDeployments. The ACR project connection uses the project identity client id plus the 2025-04-01-preview api-version, matching a working Foundry hosted-agent connection (fixes empty-credential and MissingApiVersionParameter failures). Includes a temporary go.mod replace pointing at the in-tree azd core; the core proto change must land and be published before this merges. * feat(agents): support Terraform as an IaC option for azd ai agent (#8756) * feat(agents): support Terraform as an IaC option for azd ai agent init Add `--infra=terraform` to `azd ai agent init` so it ejects a Terraform module (azurerm + azapi) to ./infra/ instead of Bicep, provisioning the same resources and emitting the same output contract, then stamps `infra.provider: terraform` so azd-core's built-in Terraform provider handles provisioning. Bicep paths are unchanged. - `--infra` flag changed from bool to string with NoOptDefVal=bicep: bare `--infra` ejects Bicep, `--infra=terraform` ejects Terraform, `--infra=bicep` is explicit, absent stays bicepless default. - Embed templates/terraform/*.tf; generate main.tfvars.json at eject time. - Terraform path stamps infra.provider: terraform in azure.yaml (the one place eject mutates it); Bicep path leaves azure.yaml untouched. - foundry_project_name defaults to env-name derivation when AZURE_AI_PROJECT_NAME is unset, mirroring the Bicep provider's sanitizeFoundryName. Refs #8705 * feat(agents): omit acr.tf and ACR outputs when no agent uses docker On the Terraform eject path, only emit ACR when at least one agent declares a docker: block: - acr.tf is written only in the docker case (it owns container_registry_name); main.tf has no ACR references otherwise. - outputs.tf is rendered from outputs.tf.tmpl (text/template) and includes the three AZURE_CONTAINER_REGISTRY_* / ACR connection outputs only when acr.tf is present, omitting them entirely otherwise (Terraform resolves resource references statically, so they cannot reference absent resources). - include_acr is no longer a module variable or a main.tfvars.json key -- the presence of acr.tf is the include-ACR decision. - main.tf also derives resource_group_name as rg-{env} and the project name from environment_name when those env vars are unset, so the bicepless-default -> Terraform flow provisions without extra setup. - Trimmed the ejected .tf comments to drop azd-internal references. Accepts asymmetry with the Bicep path (which always emits acr.bicep). Refs #8705 * docs(agents): fix misleading tfvars comment in writeTfvarsFile The comment claimed an 'ordered slice of lines' but the code uses a map[string]any. json.MarshalIndent sorts map keys, so the generated main.tfvars.json is deterministic; the comment now says so. Addresses a Copilot review note on PR #8756. The two other Copilot notes (replace() regex on main.tf:33, coalesce() empty-string fallback on main.tf:12) are false positives: Terraform's replace() treats a /.../-wrapped substring as a regex, and coalesce() skips empty strings, both verified against the HashiCorp docs and a live azd provision --preview. * feat(agents): secure-by-default private networking via azure.yaml (#8708) * feat(agents): private networking for host: microsoft.foundry Add a declarative network: block to the Foundry service in azure.yaml and teach the bicep-less synthesizer to provision a VNet-bound (network-secured) account from it. Additive: an absent block yields today's public account. - schema: network: surface (mode byo|managed, byo vnet/subnets tri-state, managed isolationMode, dns create-or-reference) on microsoft.foundry.json - synthesizer: decode network:, resolve ${VAR}, validate (mode coherence, vnet ARM id, subnet tri-state/CIDR, DNS rg/sub), emit network params + NetworkMode for telemetry - templates: new modules/network.bicep, subnet.bicep, private-endpoint-dns.bicep; resources.bicep/main.bicep guard the network path on enableNetworkIsolation (publicNetworkAccess Disabled, networkAcls Deny, agent networkInjections, account private endpoint + 3 AI DNS zones); main.arm.json regenerated - provider: pass azd env for ${VAR} resolution, emit provision.network_mode, warn that network: is ignored when endpoint: (brownfield) is set - docs/tests: synthesizer network tests, eject module assertions, extension README network section, telemetry-data.md provision.network_mode * fix(agents): preserve network ${VAR} refs through eject The existing on-disk provision flow resolves ${VAR} in main.parameters.json from the azd environment at provision time. Eject must therefore keep ${VAR} references verbatim instead of resolving them eagerly from the process env and freezing a literal into the ejected file. - synthesis.Input gains PreserveVarRefs; when set, byo.vnet.id and dns.subscription pass through verbatim and the format checks that cannot run on an unexpanded placeholder are skipped (concrete-but-malformed still fails) - eject (init --infra) sets PreserveVarRefs so the ejected main.parameters.json stays environment-portable; the provision path still resolves and validates - tests: synthesizer preserve-mode (pass-through + concrete validation) and an eject e2e asserting ${AZURE_VNET_ID}/${AZURE_DNS_SUBSCRIPTION_ID} survive * test(agents): add Foundry private-networking E2E harness Bash E2E harness validating host: microsoft.foundry private networking, designed to minimize Azure resource-operation time: - ONE real network account is provisioned (create+own matrix cell) with a BYO --image agent, then deploy + invoke prove the agent works under the VNet. - Scenario 1 (bicep-less) and the other 3 matrix cells (subnet create/reference x DNS own/reference) are verified with 'azd provision --preview' (ARM what-if), which creates nothing. - Scenario 2 (eject) is verified against the live account: eject -> what-if reports no changes (idempotent), proving the on-disk template + provision-time ${VAR} resolution reproduces the in-memory topology; a manual infra/ edit then surfaces as the only delta. Guards the ${VAR}-preservation fix end-to-end. - A shared BYO VNet (+ reference subnets / external DNS zones) is created once and reused across cells. Files: run-network-e2e.sh (phases 0-6 orchestrator), assert-resources.sh (live az topology checks: publicNetworkAccess Disabled, account PE groupIds, 3 AI DNS zones, agent-subnet delegation), lib.sh (logging/assert/azure.yaml mutation), README.md (cost rationale, prerequisites, cleanup). Westus account region per requirement; AcrPull granted to the project MI on the ABAC registry. * test(agents): run network E2E phases 0-4 without --image Decouple the private-networking E2E from the BYO-image init UX (PR 8689) so it runs against the current branch today: - Replace 'azd ai agent init --image' with a hand-authored azure.yaml fixture (foundry service + network: block + agent image:), created via 'azd env new'. image: yields includeAcr=false, matching BYO image, so no ACR at provision. Verified the fixture synthesizes: mode=byo, enableNetworkIsolation=true, includeAcr=false, ${VAR} resolved. - Gate phase 5 (deploy + invoke) behind RUN_DEPLOY=true: the headless BYO-image deploy needs the AZD_AGENT_SKIP_ACR short-circuit from PR 8689, otherwise deploy defaults to build and fails. Phases 0-4 (local gates, shared VNet, what-if matrix, one real provision, eject idempotency) validate all the networking code without it. - Fix the ABAC registry role: grant 'Container Registry Repository Reader' (ABAC-aware) instead of AcrPull; move the grant into the gated deploy phase. - Drop the --image preflight; README updated (scenario table, prerequisites, RUN_DEPLOY usage, role). * fix(agents): correct network bicep preflight + network-mode output casing Two product bugs surfaced by live E2E provisioning (ARM what-if does not catch either; both require a real deployment): 1. networkInjections preflight failure. The account and the network module deploy in the same template, so subnetArmId: network!.outputs.agentSubnetId compiled to an unresolved reference() at the CognitiveServices RP preflight, which then failed to convert networkInjections to its typed contract (InvalidResourceProperties). Build the subnet ARM id as a deterministic string from the concrete vnetId param instead, and add an explicit dependsOn so the subnet still exists before injection. Recompiled main.arm.json. 2. AZURE_FOUNDRY_NETWORK_MODE missing from canonicalOutputNames. ARM mangles output-name casing (AZURE_..._MODE -> azurE_..._MODE); without the canonical remapping the env key was stored mis-cased and azd env get-value AZURE_FOUNDRY_NETWORK_MODE returned empty. Added it to the restore list and a regression case to TestArmOutputsToProto_RepairsMangledKeyCase. Validated end-to-end: real westus network-isolated Foundry account provisions green with all topology assertions passing (publicNetworkAccess Disabled, networkAcls Deny, private endpoint, agent-subnet delegation, 3 AI DNS zones, network mode byo), across the full subnet create/reference x DNS own/reference matrix, plus eject idempotency (what-if reports no changes). * test(agents): make network E2E harness run green end-to-end Fixes found while running the harness against live Azure (phases 0-4): - Hand-authored project must include an agent.yaml (kind: hosted + image:) alongside azure.yaml; the foundry provider requires an agent definition file. - setup_project now sets AZURE_RESOURCE_GROUP (the subscription-scoped template creates the RG but the provider needs the name) and AZD_AGENT_SKIP_ACR=true (BYO-image deploy signal). - Phase 0 refreshes the dev extension from current source (build -> pack -> publish -> install) so the run tests local code, registering the provisioning-provider capability + microsoft.foundry provider. Gated by SKIP_EXT_REFRESH. - What-if matrix gates on a successful ARM what-if (exit 0) rather than grepping a summary-only preview; this still validates reference-mode subnet/zone existence and delegation against the real VNet. - Idempotent private-dns zone creation (reruns no longer fail on existing zones). - Add MAX_PHASE to stop early while iterating. - ACR grant uses the ABAC-aware Container Registry Repository Reader role. - Fix set -u unbound-variable crash in the phase-4 assert message. - .gitignore the transient per-run log directories. Phases 0-4 (local gates, shared VNet, what-if matrix, one real provision + topology assertions, eject idempotency) pass green. Phase 5 (deploy + invoke) stays gated behind RUN_DEPLOY=true and needs a reachable BYO agent image. * test(agents): build phase-5 image in ABAC-enabled ACR Update the Foundry private-network E2E harness so phase 5 can build the ~/agents/echo-dual image itself instead of requiring a prebuilt external image. - Add BUILD_IMAGE=true, ECHO_DUAL_DIR, ACR_NAME/ACR_RG, IMAGE_REPO/IMAGE_TAG. - Create the target ACR with --role-assignment-mode rbac-abac and reject reuse of non-ABAC registries. - Grant the caller Container Registry Repository Writer before the ACR Task push. Resolve the caller object id from the ARM token oid claim to avoid Microsoft Graph / Conditional Access failures. - Build with the required `az acr build --source-acr-auth-id [caller]` form. - Keep the project MI grant on the ABAC-aware Container Registry Repository Reader role for image pull. - Add TARGET_RG support so investigation runs can keep VNet, DNS, ACR, and the real Foundry env in a single RG. Live validation: the harness created an ABAC ACR, granted caller writer, built and pushed ~/agents/echo-dual with --source-acr-auth-id [caller], provisioned a private-networked Foundry account, and granted the project MI Repository Reader. The subsequent deploy failed from this public runner with the expected private endpoint 403, which is documented. * test(agents): grant private ACR pull to Foundry project identity Live phase-5 validation showed hosted-agent image pull uses the Foundry project managed identity, not the parent account identity. Update the network E2E harness to resolve AZURE_AI_PROJECT_ID via ARM and grant the project MI the ABAC-aware Container Registry Repository Reader role on the BYO ACR, falling back to the account MI only for older API shapes. Also persist AZURE_TENANT_ID in the azd env so postdeploy hooks do not fail on VM/managed-identity runners after deploy succeeds. * docs(agents): add BYO image VNet cheatsheet Add a concise README cheatsheet for initializing, provisioning, deploying, and invoking a hosted Foundry agent with a BYO container image under VNet private networking. Include ACR requirements for ABAC and private-only registries. * docs(agents): move private networking guide to docs Keep the extension README concise by moving the detailed Foundry private networking schema, requirements, and BYO-image VNet cheatsheet into `docs/private-networking.md`, with a short README pointer. * fix(agents): surface managed-network isolation output Live managed-network provisioning showed that the resources module emitted AZURE_FOUNDRY_MANAGED_ISOLATION_MODE but the subscription-scoped main template never forwarded it, so azd env only received AZURE_FOUNDRY_NETWORK_MODE. Forward the output from main.bicep, add it to the provider canonical output-name restore list, and cover ARM casing repair with a regression test. Also document the managed VNet provisioning scenario in the private-networking guide. Live validation: provisioned network.mode=managed in westus and verified the account had publicNetworkAccess Disabled, networkAcls Deny, networkInjections with useMicrosoftManagedNetwork=true, AZURE_FOUNDRY_NETWORK_MODE=managed, and AZURE_FOUNDRY_MANAGED_ISOLATION_MODE=AllowInternetOutbound. * fix(agents): keep managed-network data plane reachable Live managed-network deploy validation showed that managed mode configures the hosted-agent runtime to use a Microsoft-managed network but does not create a customer private endpoint for the Foundry data plane. Disabling public access in that mode makes azd deploy/invoke fail with `403 Public access is disabled`. Keep public data-plane access enabled for managed mode while preserving BYO mode behavior (public access disabled + private endpoint). Update the private networking guide with managed deploy/invoke guidance. Live validation: provisioned managed mode, converted the test ACR to ABAC, built the echo-dual image with `az acr build --source-acr-auth-id [caller]`, granted the Foundry project MI `Container Registry Repository Reader`, deployed successfully, and invoked the hosted agent successfully. * feat(agents): secure-by-default private networking for Foundry services Realign the azure.yaml `network:` surface to the natural Azure resource shape and make a network-bound Foundry account private in every mode. Reverses the prior managed-mode regression that flipped the account's publicNetworkAccess back to Enabled. Service sample 18 confirms managed mode supports a private data plane (customer private endpoint + the Microsoft-managed egress network), so declaring `network:` now always disables public data-plane access. Config: flat `network:` block with two orthogonal axes, no `mode` enum. - peSubnet (required) -> account private endpoint; omitting it while `network:` is declared is an error, never a silent public fallback. - agentSubnet (optional) -> present injects the agent into a customer subnet (BYO egress); absent uses the Microsoft-managed network (managed egress), where isolationMode becomes valid. Synthesizer/templates: - derive egress from agentSubnet presence (useManagedEgress); replace the networkMode param with a useManagedEgress bool. - disablePublicDataPlaneAccess = enableNetworkIsolation (always private). - add a managedNetworks/default child resource carrying isolationMode for managed egress. - validate peSubnet-required, isolationMode-managed-only, and single-VNet. Docs/tests/e2e: - rewrite docs/private-networking.md (host: azure.ai.agent, the value the provision provider actually accepts on this branch). - add synthesizer unit tests + a compiled-ARM regression guard. - add a live E2E harness (8-cell what-if matrix, BYO + managed-iso real provisions, eject idempotency) with an automatic jumpbox SOCKS tunnel so deploy/invoke can reach the private data plane; assert real account network-injection state rather than azd's echoed output. * docs(agents): use 'azd ai agent init --image' and assert real state in private-networking cheatsheet Managed-egress cheatsheet now scaffolds the agent via 'azd ai agent init --image' (writes agent.yaml) instead of assuming a hand-authored manifest, matching the BYO cheatsheet. Replace the env-output 'Expected outputs' block (azd echoing its own AZURE_FOUNDRY_* classification) with real resource-state validation: account publicNetworkAccess=Disabled and the managedNetworks/default isolationMode, with the invoke echo response as the end-to-end proof. * docs(agents): order init before azure.yaml in managed-egress cheatsheet 'azd ai agent init --image' scaffolds azure.yaml/agent.yaml, so it must come before the network: block the reader adds to the generated service. Matches the actual timing and the BYO cheatsheet ordering. * docs(agents): simplify private-networking cheatsheets - Drop the redundant 'azd env set AZD_AGENT_SKIP_ACR true': passing --image to 'azd ai agent init' already derives skipACR() and writes the env var into the environment init creates/selects. Reuse that env (no separate 'azd env new') so init and provision share one environment. - BYO cheatsheet: remove the export-variable indirection; inline placeholders directly into 'azd env set', matching the managed-egress cheatsheet. - Managed-egress cheatsheet: remove the weak 'validate via az show' / env-output block; a successful invoke over the private endpoint is the end-to-end proof. * test(agents): fix jumpbox fallback for DNS-reference deploy In peered fallback mode, populate JB_HOST and wait for SSH before writing /etc/hosts. DNS-reference accounts can also assign different PE IPs per privatelink zone, so pin each account FQDN from the private DNS A record instead of mapping every FQDN to the first PE NIC IP. * test(agents): exclude private-networking e2e harness from PR Remove the ad-hoc live validation harness from the committed diff while keeping it available locally for investigation runs. * test(agents): ignore bicep generator metadata in ARM drift check Compare normalized ARM JSON instead of raw bytes so CI/dev Bicep version metadata does not cause false stale-template failures while still catching semantic template drift. * feat(agents): validate distinct subnet names; drop debug log * feat(agents): refuse Terraform eject when network: is declared The Terraform module has no VNet/private-endpoint/DNS/networkInjections resources, so ejecting it for a network: service would silently drop the config and provision a public account. Fail fast with a clear error instead, preserving the network: secure-by-default contract. Bicep eject remains the supported path for private networking. Also fixes a merge artifact (single-arg ejectInfra call in a test). * docs+test(agents): document Bicep eject customization; verify full network param set - docs: add 'Advanced: eject the Bicep and customize it' section to the private-networking cheatsheet (eject -> manual edit -> provision -> deploy -> invoke), with two worked Bicep edits and a Terraform-unsupported note. - test: assert the complete network parameter set lands in the ejected main.parameters.json for BYO egress, and extend the managed-egress eject test with the full param contract. * docs(agents): merge networking Requirements/Known limitations into one Limitations section * docs(agents): rename networking quick guides to parallel Scenario 1-3 headings * feat: generalize $ref resolver + shared $ref-aware azure.yaml edit helper (#8777) * feat(agents): split Foundry init into per-resource azure.yaml services (#8675) * feat: carry Foundry agent and resource definitions in azure.yaml (#8779) * feat(agents): move Foundry networking to project service (#8809) * feat(agents): move Foundry networking to project service * fix(agents): reject conflicting Foundry network schema * fix(agents): align split Foundry schemas and ACR scan * fix(agents): preserve pre-split Foundry provisioning fallback * docs(agents): split complex Foundry schema example * ci: add cspell words for ai agents extension * ci: exclude gosec G204/G304 for agents test lint * fix: apply go fix and gofmt in agents and skills exts * test: add kind and name to resolveAgentProtocol fixtures * fix: log foundry env fallback and document deploy-context guard * fix: remove foundry agent legacy-config telemetry from azd core * fix: default code-deploy dependency_resolution to remote_build * fix: default rg name and emit AZURE_TENANT_ID in foundry provision * fix: write endpoint on ai-project when reusing existing project * fix: don't declare existing model deployments in azure.yaml * feat: derive azure.yaml project service key from project name * fix: resolve $ref includes when synthesizing foundry infra * fix: harden foundry provisioning provider edge cases * fix: make agent infra eject retryable and reject --image * fix: seed used service names and slices.Sort keys * fix: zip agent override source dir on code deploy * fix: skip deploy upsert for non-key connection auth * feat: keep deprecated agent config shape in schema * fix: debug logs, preserve uses: on optimize apply, acr network isolation * feat: provision model deployments on existing Foundry project * fix(test): update recording tests for inline agent definition The functional recording tests still asserted that agent.yaml exists on disk, but the PR moved agent definitions inline into azure.yaml service entries. Update both Test_AIAgent_Init_NoPrompt_Defer and Test_AIAgent_Init_NoPrompt_WithProject to: - Assert .agentignore exists instead of agent.yaml - Cross-check resolved model name in azure.yaml instead of agent.yaml - Update layout comments to reflect the new structure Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Wei Meng <wemeng@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Zhijie Huang <zhihuan@microsoft.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: trangevi <trangevi@microsoft.com>

huimiu added 2 commits June 23, 2026 12:04

feat: generalize $ref resolver for separate-services azure.yaml

ffb505f

feat: add $ref-aware azure.yaml edit helper for Foundry services

89a5d3a

huimiu requested review from JeffreyCA, glharper, therealjohn, trangevi and trrwilson as code owners June 23, 2026 04:05

microsoft-github-policy-service Bot assigned huimiu Jun 23, 2026

github-actions Bot added the ext-agents azure.ai.agents extension label Jun 23, 2026

huimiu changed the title ~~Generalize $ref resolver + shared $ref-aware azure.yaml edit helper (PR3)~~ feat: generalize $ref resolver + shared $ref-aware azure.yaml edit helper Jun 23, 2026

huimiu linked an issue Jun 23, 2026 that may be closed by this pull request

[ext-foundry]: wire $ref file includes + overlay overrides for separate-services azure.yaml #8775

Open

huimiu requested a review from Copilot June 23, 2026 04:41

Copilot started reviewing on behalf of huimiu June 23, 2026 04:41 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread cli/azd/extensions/azure.ai.agents/internal/project/includes_edit_test.go Outdated

Comment thread cli/azd/extensions/azure.ai.agents/internal/project/includes_edit_test.go Outdated

huimiu added 2 commits June 23, 2026 12:50

test: add EntryRef edge-case tests for empty/nil/numeric \

6399807

test: guard substring order assertions

8529134

huimiu mentioned this pull request Jun 23, 2026

[ext-foundry]: make project/connection/toolbox Foundry resources real service targets #8774

Open

fix: clarify ref helper docs for review

27f2163

This was referenced Jun 23, 2026

feat: per-host azure.yaml schemas and service targets for Foundry AI resources #8781

Merged

feat: complete Foundry azure.yaml unification (sibling targets, endpoint reuse) #8783

Merged

trangevi approved these changes Jun 23, 2026

View reviewed changes

huimiu merged commit 20bbdb9 into huimiu/foundry-azure-yaml Jun 24, 2026
20 checks passed

Copilot AI mentioned this pull request Jun 25, 2026

Prepare azd 1.26.0 core release #8807

Merged

huimiu mentioned this pull request Jun 25, 2026

feat: unify azure yaml, agent & resource definitions, $ref includes, and IaC-less init #8818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: generalize $ref resolver + shared $ref-aware azure.yaml edit helper#8777

feat: generalize $ref resolver + shared $ref-aware azure.yaml edit helper#8777
huimiu merged 5 commits into
huimiu/foundry-azure-yamlfrom
hui/foundry-ref-includes

huimiu commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

huimiu commented Jun 23, 2026

What

Resolver generalization (includes.go — docs + tests only)

Shared $ref-aware YAML edit helper (includes_edit.go — new)

Out of scope

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Resolver generalization (`includes.go` — docs + tests only)

Shared `$ref`-aware YAML edit helper (`includes_edit.go` — new)