Skip to content

Unify Foundry agent configuration in azure.yaml #7962

Description

@therealjohn

This proposes two changes to how the azure.ai.agent extension models a Foundry agent project in azure.yaml:

  1. Consolidate all hosted-agent config into azure.yaml, retiring agent.yaml and agent.manifest.yaml.
  2. Restructure so that a single azure.ai.project service owns all Foundry data-plane state -- toolboxes, connections, model deployments, and future project-scoped resources. Agents reference it via uses. No Bicep files are required by default; developers can opt in to Bicep on disk when they need full IaC reproducibility.

Today, azure.ai.agent packs every project-scoped resource inside one hosted agent service, and ships two extra config files alongside azure.yaml. That made sense as a starting point, but it conflates the agent runtime with the Foundry project around it, and it makes sharing resources across agents awkward. The Foundry Toolkit for VS Code will move to reading azure.yaml directly once these changes land.

Current Problems

  1. Three files, overlapping data. The agent name appears in three places, container resources in two, the model deployment name in three. Two templating syntaxes ({{param}} and ${ENV}) overlap.
  2. Scope conflation. services.<agent>.config mixes things that genuinely belong to one agent (container resources, env, startup command) with project-scoped resources (model deployments, connections, toolboxes). Some of these are ARM resources that should live in Bicep; the rest are Foundry data-plane resources that don't belong nested under any single agent.
  3. No sharing across agents. Because project-scoped resources are nested under an agent today, a second agent that wants the same toolbox has to redeclare it. There is nowhere to say "this toolbox belongs to the project; these agents reference it."
  4. Divergent tooling. The Foundry Toolkit parses agent.yaml (AgentDefinition) directly, azd ai agent use an AgentManifest and generates an AgentDefinition, but has to also mix orchestration with azure.yaml. They feel like separate experiences.
  5. The manifest layer carries no weight. agent.manifest.yaml was designed for an agent catalog that didn't get built. The templating it adds isn't paying for itself.
  6. No real ability to share AgentDefinitions were intended to be concrete definitions, but in practice any real values get abstracted with AZD environments (${ENV_VAR}) effectively becoming an templated definition, which confused the purpose of an AgentManifest.

Solution Hypothesis

The shape we want is:

  • A single host: azure.ai.project service owns all Foundry data-plane state that can't be modeled in ARM/Bicep. Today that includes toolboxes and connections; future additions (eval datasets, vector indexes, fine-tunes) go here too. The "project" maps directly to the Foundry entity these resources belong to. No per-resource-type host proliferation (no azure.ai.toolbox, azure.ai.connection, etc.).
  • host: azure.ai.agent describes the agent runtime. The config: block maps to the Foundry create-agent API (kind, description, metadata, protocols, container resources, env, startupCommand). Agents reference the project service via uses: [foundry-project].
  • Deploy mode is explicit: if a docker: block is present, container mode. If a runtime: block is present, code-deploy mode. If neither: validation error. If both: validation error. No silent defaults.
  • The runtime: block follows the existing azure.yaml schema precedent (runtime: { stack: python, version: "3.13" }), not a bare string.
  • No Bicep files in the repo by default. The extension carries built-in Bicep templates internally (like AZD compose) and generates them in memory during azd provision. Developers can opt into Bicep on disk via azd infra gen or equivalent. The composition mechanism of add/remove to the YAML is tracked in a separate RFC (not yet filed).
  • Service ordering uses azd's existing uses field. uses is the inter-service dependency primitive in ServiceConfig today; no schema addition needed.
  • agent.yaml and agent.manifest.yaml go away.

The mental model shifts from "one big agent blob with everything inside it" to "a Foundry project that owns shared resources, plus agent services that reference it."

Required azure.yaml Schema Changes

azd would need to recognize one new host kind under services.<name>.host: azure.ai.project. That would result in these services:

Host kind Owns Provisioning verb Deploy verb
azure.ai.agent (already exists) Agent runtime (container or code-deploy) (none -- needs Foundry project to exist) Push agent definition + container/zip
azure.ai.project (new) All Foundry data-plane state (toolboxes, connections, model deployments, future resources) Create Foundry project (ARM) Create/update data-plane resources via Foundry APIs

Each host kind owns its own JSON schema for the config block. The schemas would live in the azure.ai.agents extension like the existing one already does.

azure.ai.project is a service without source code -- it has no project: directory, no build step, no artifact. Its config: block declaratively describes the Foundry data-plane state. provision creates the ARM-level Foundry project; deploy creates or updates toolboxes, connections, and other data-plane resources via Foundry APIs.

Final Shape: azure.ai.agent.json (after the change)

The agent runtime schema shrinks substantially once the project-scoped fields move to azure.ai.project:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Azure AI Agent Runtime",
  "description": "Configuration for a hosted agent runtime in a Foundry project.",
  "type": "object",
  "properties": {
    "kind":         { "type": "string", "enum": ["hosted", "prompt"] },
    "description":  { "type": "string" },
    "metadata":     { "type": "object", "additionalProperties": { "type": "string" } },
    "protocols": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "protocol": { "type": "string" },
          "version":  { "type": "string" }
        },
        "required": ["protocol", "version"]
      }
    },
    "container": {
      "type": "object",
      "description": "Container resources. Only relevant when the service has a docker: block (container mode).",
      "properties": {
        "resources": {
          "type": "object",
          "properties": {
            "cpu":    { "type": "string", "pattern": "^[0-9]+(\\.[0-9]+)?m?$" },
            "memory": { "type": "string", "pattern": "^[0-9]+(\\.[0-9]+)?(Ki|Mi|Gi|Ti|Pi|Ei|k|M|G|T|P|E)?$" }
          }
        }
      }
    },
    "env": {
      "type": "object",
      "additionalProperties": { "type": "string" }
    },
    "startupCommand": { "type": "string" }
  },
  "additionalProperties": false
}

Removed from this schema: deployments[], resources[], toolConnections[], toolboxes[], connections[]. These all move to the azure.ai.project service. Also removed: runtime and entrypoint as config-level fields -- runtime mode is now expressed at the service level via a typed runtime: block (see below).

The config: block intentionally maps closely to the Foundry create-agent API contract -- it describes what the agent IS to Foundry. Deploy mode is determined at the service level:

  • Container mode -- the service has a docker: block. azd builds and pushes via the existing docker.path and docker.remoteBuild fields on ServiceConfig. Same packaging flow as any other containerized azd service.
  • Code-deploy mode -- the service has a runtime: block at the service level (not inside config:). This follows the existing runtime definition in azure.yaml schema:
    runtime:
      stack: python
      version: "3.13"
    azd zips the project directory and Foundry schedules it on the appropriate managed base image.
  • Validation rules: docker: and runtime: are mutually exclusive (both present = validation error). Neither present = validation error. No silent defaults -- they cause debugging nightmares at deploy time.

New Sibling Schema: azure.ai.project.json (sketch)

azure.ai.project -- one Foundry project's data-plane state. Owns all project-scoped resources that can't be modeled in ARM/Bicep:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Azure AI Foundry Project",
  "description": "Project-scoped Foundry data-plane resources.",
  "type": "object",
  "properties": {
    "toolboxes": {
      "type": "object",
      "description": "Named toolboxes. Each key is the toolbox name.",
      "additionalProperties": {
        "type": "object",
        "properties": {
          "tools": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "type":       { "type": "string", "description": "e.g., web_search, mcp, code_interpreter" },
                "connection": { "type": "string", "description": "Connection name or env var ref for connection-backed tools like mcp." }
              },
              "required": ["type"]
            }
          }
        }
      }
    },
    "connections": {
      "type": "array",
      "description": "Foundry project connections.",
      "items": {
        "type": "object",
        "properties": {
          "name":     { "type": "string" },
          "category": { "type": "string" },
          "target":   { "type": "string" },
          "authType": { "type": "string" }
        },
        "required": ["name", "category", "target"]
      }
    },
    "deployments": {
      "type": "array",
      "description": "Model deployments.",
      "items": {
        "type": "object",
        "properties": {
          "name":  { "type": "string" },
          "model": { "type": "object" },
          "sku":   { "type": "object" }
        },
        "required": ["name", "model"]
      }
    }
  },
  "additionalProperties": true
}

additionalProperties: true leaves room for future project-scoped resources (eval datasets, vector indexes, knowledge sources) without schema-breaking changes.

Example azure.yaml After the Change

services:
  # Project-scoped: all Foundry data-plane resources in one place.
  # No source directory, no build artifact -- pure declarative state.
  foundry-project:
    host: azure.ai.project
    config:
      deployments:
        - name: gpt-4.1-mini
          model: { format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14" }
          sku: { name: GlobalBatch, capacity: 10 }
      connections:
        - name: github-mcp-conn
          category: CustomKeys
          target: https://api.githubcopilot.com/mcp
          authType: ApiKey
      toolboxes:
        agent-toolbox:
          tools:
            - { type: web_search }
            - { type: code_interpreter }
            - { type: mcp, connection: ${GITHUB_MCP_CONN} }

  # Agent-scoped: the runtime, references the project via uses
  my-agent:
    project: src/my-agent
    host: azure.ai.agent
    uses: [foundry-project]
    # Code-deploy mode: runtime block present -> zip-deploy
    runtime:
      stack: python
      version: "3.13"
    config:
      kind: hosted
      description: A basic agent hosted by Foundry.
      protocols:
        - { protocol: responses, version: 1.0.0 }
      env:
        AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini
      startupCommand: python main.py

  # Container mode variant: replace runtime with docker
  #
  #   my-agent:
  #     project: src/my-agent
  #     host: azure.ai.agent
  #     uses: [foundry-project]
  #     docker: { path: Dockerfile, remoteBuild: true }
  #     config:
  #       kind: hosted
  #       protocols: [{ protocol: responses, version: 1.0.0 }]
  #       container:
  #         resources: { cpu: "0.25", memory: 0.5Gi }
  #       env:
  #         AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini

# No infra: block needed by default.
# azd provision uses built-in Bicep templates internally (like AZD compose).
# Opt in to Bicep on disk via azd infra gen or equivalent (separate RFC).

A second agent that wants the same toolbox is just another host: azure.ai.agent entry with uses: [foundry-project] -- nothing duplicated. The project service is the single source of truth for shared resources.

Dependency Flow

The pattern uses azd's existing mechanisms, no new wiring needed:

  1. azure.ai.project provisions first. azd provision creates the Foundry project (ARM). azd deploy then deploys the foundry-project service, which creates data-plane resources (toolboxes, connections, model deployments) via Foundry APIs. These resources are declared in the project's config: block.
  2. uses orders services. The agent declares uses: [foundry-project], so the project service deploys before any agent. uses is azd's existing service-to-service dependency primitive on ServiceConfig -- it controls deploy ordering and surfaces the dependency's outputs as env vars on the dependent service.
  3. Env var expansion uses ${VAR} syntax. Same mechanism azure.yaml already supports. Connection references in toolbox configs (e.g., connection: ${GITHUB_MCP_CONN}) are resolved from the azd environment at deploy time.

So the chain is: provision (ARM resources) -> project deploy (data-plane resources) -> agent deploy (agent definition + code/container). Each step uses primitives azd already supports.

No Bicep files are required in the repo. The extension handles ARM provisioning internally, generating Bicep from the azure.yaml state (similar to AZD compose). Developers who need explicit Bicep on disk can eject via azd infra gen or equivalent -- that mechanism is covered by a separate RFC (not yet filed).

Criteria

Belongs in services.<agent>.config (host: azure.ai.agent): kind, description, metadata, protocols, container resources, env, startupCommand. The config: block maps to the Foundry create-agent API contract.

Belongs at the service level (existing ServiceConfig fields, no additions): docker: (container packaging via docker.path and docker.remoteBuild), runtime: (code-deploy via typed { stack, version } block), uses: (service ordering plus env-var injection from dependencies), and project: (source directory). Container vs. code-deploy is discriminated by the presence of docker: vs. runtime: -- mutually exclusive, and at least one is required (validation error otherwise).

Belongs in services.<project>.config (host: azure.ai.project): All Foundry data-plane resources that are project-scoped: toolboxes, connections, model deployments, and future resources like eval datasets and vector indexes. One azure.ai.project service per project. This is a service without source code -- no build, no artifact. Its config: block is pure declarative state.

Does NOT belong in azure.yaml: model selection at request time, tool implementations, instructions and prompts (all in agent code); secrets (.env, Key Vault, or azd environment); cross-environment endpoint values (azd environment, .azure/{env}/).

The principle we're aiming for: azure.yaml describes what exists in the Foundry project and how the agent runs. Agent code defines what the agent does. The azd environment carries deployment-target values. Bicep is opt-in for developers who need full IaC reproducibility.

Downstream Impact

  • azure.ai.agents extension picks up the new azure.ai.project host kind alongside the existing azure.ai.agent. Its deploy hook reads project and agent definitions from azure.yaml, substitutes env vars for connection references, and calls the Foundry APIs. init stops emitting agent.yaml and agent.manifest.yaml, with a fallback to the old files for one deprecation window. No Bicep files are generated by default.
  • Foundry Toolkit for VS Code drops its agent.yaml parser and reads/writes azure.yaml instead.
  • Samples no longer rely on an AgentManifest. Project-level resources (models, connections, toolboxes) are declared in the azure.ai.project service. No Bicep required for the default path.
  • Bicep-less provisioning requires a separate RFC (not yet filed) defining how the extension generates ARM templates internally from azure.yaml state, similar to AZD compose.

Alternatives Considered

Make agent.yaml the source of truth; introduce azure.yaml only on opt-in

The natural inversion: keep agent.yaml as the AgentDefinition (redesigned to absorb the runtime config this proposal puts in services.<name>.config) and have azd ai agent init / deploy / invoke operate directly on it via a --project-endpoint flag. azure.yaml shows up only when the developer opts into broader AZD features -- multi-service orchestration, Bicep, environments, CI/CD -- at which point an azure.yaml service block points at the existing agent.yaml. AgentManifest stays parked for a future catalog.

What it gets right:

  • A startup developer can go init -> deploy -> invoke without ever touching .azure/, azure.yaml, or infra. Lines up with the primary persona in framing.md.
  • Foundry Toolkit for VS Code cutover is cheap -- both CLI and Toolkit read the same per-agent file. No azure.yaml schema work for the Toolkit team.
  • Matches the dominant competitor mental model. AgentCore CLI is single-file standalone; Claude is API-only. Microsoft stops being the outlier asking developers to learn an orchestration framework first.
  • Schema ownership is cleaner. agent.yaml is "what the agent IS to Foundry"; azure.yaml is "how azd orchestrates." Less cross-team negotiation when either side evolves.

Why it doesn't hold up:

agent.yaml is per-agent by definition. Anything project-scoped -- toolboxes today, plus future shared concepts like knowledge indexes -- has nowhere good to live. The three sub-options are all bad:

  1. Inline in every agent.yaml. Each agent redeclares its shared toolbox. Reintroduces the "no sharing across agents" problem from Current Problems, just in a different file.
  2. Invent a higher-level Foundry config (e.g., foundry.yaml). Three files again -- agent.yaml, foundry.yaml, azure.yaml -- with overlap potential. Worse than today.
  3. Make project-scope an AZD-only capability. Forces developers with even two agents that share a toolbox to opt into AZD, which defeats the whole point of the alternative.

The agentcore comparison sharpens the mismatch. agentcore.yaml works as a single standalone surface because it IS the project-level container -- runtimes (multiple agents), mcpRuntimeTools, memories, credentials, gateways all sit at the same top level. agent.yaml's per-agent scope has no equivalent. In our world, azure.yaml already plays the project-level container role; pushing that responsibility down into a per-agent file doesn't fit. The azure.ai.project service is explicitly this project-level container -- it maps to the Foundry project entity and owns all shared state.

Hybrid: keep agent.yaml for the per-agent definition, azure.yaml for orchestration only

A softer variant: keep agent.yaml as the portable per-agent definition (what Foundry Toolkit reads) and let azure.yaml carry only project-level orchestration -- a thin service block that references agent.yaml and adds packaging plus uses:

services:
  my-agent:
    project: src/my-agent
    host: azure.ai.agent
    docker: { path: Dockerfile }
    uses: [agent-toolbox]
    # No config: block. Agent definition lives in src/my-agent/agent.yaml.

This dodges the duplication failure mode by structurally separating concerns: agent.yaml carries Foundry-create-agent fields; the service block carries azd packaging and orchestration. Rejected because:

  • It keeps two parallel deploy code paths alive (standalone reads agent.yaml; AZD-mode reads azure.yaml + agent.yaml), each with its own schema discipline and edge cases.
  • Keeping the service block free of agent-definition fields requires permanent schema vigilance; easy to violate as new features land.
  • The win of "no azure.yaml in the root" is mostly perceptual. This proposal's standalone path is already azure.yaml + agent code -- no .azure/, no infra/. File count and learning curve match; only the filename differs.
  • Foundry Toolkit alignment is a real cost in this proposal, but bounded -- VS Code already understands azure.yaml for other azd features, and the parser change is a one-time migration.

Migration Path

  1. azd schema recognizes the new azure.ai.project host kind. All other primitives (docker:, runtime:, uses:, project:, language:) are reused from the existing ServiceConfig -- no new top-level service fields, no core schema change beyond the host kind.
  2. azure.ai.agents extension ships the two schemas (azure.ai.agent.json, azure.ai.project.json) and deploy logic for both host kinds.
  3. init generates the consolidated azure.yaml (with azure.ai.project + azure.ai.agent services) and stops emitting agent.yaml / agent.manifest.yaml. No Bicep files generated by default.
  4. Deploy hook reads from azure.yaml, falling back to agent.yaml during the deprecation window.
  5. Foundry Toolkit for VS Code switches its parser.

Metadata

Metadata

Labels

enhancementNew feature or improvementext-agentsazure.ai.agents extension

Fields

No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions