Unify Foundry agent configuration in azure.yaml

This proposes two changes to how the `azure.ai.agent` extension models a Foundry agent project in `azure.yaml`:

1. **Consolidate** all hosted-agent config into `azure.yaml`, retiring `agent.yaml` and `agent.manifest.yaml`.
2. **Restructure** so that a single `azure.ai.project` service owns all Foundry data-plane state -- toolboxes, connections, model deployments, and future project-scoped resources. Agents reference it via `uses`. No Bicep files are required by default; developers can opt in to Bicep on disk when they need full IaC reproducibility.

Today, `azure.ai.agent` packs every project-scoped resource inside one hosted agent service, and ships two extra config files alongside `azure.yaml`. That made sense as a starting point, but it conflates the agent runtime with the Foundry project around it, and it makes sharing resources across agents awkward. The Foundry Toolkit for VS Code will move to reading `azure.yaml` directly once these changes land.

## Current Problems

1. **Three files, overlapping data.** The agent name appears in three places, container resources in two, the model deployment name in three. Two templating syntaxes (`{{param}}` and `${ENV}`) overlap.
2. **Scope conflation.** `services.<agent>.config` mixes things that genuinely belong to one agent (container resources, env, startup command) with project-scoped resources (model deployments, connections, toolboxes). Some of these are ARM resources that should live in Bicep; the rest are Foundry data-plane resources that don't belong nested under any single agent.
3. **No sharing across agents.** Because project-scoped resources are nested under an agent today, a second agent that wants the same toolbox has to redeclare it. There is nowhere to say "this toolbox belongs to the project; these agents reference it."
4. **Divergent tooling.** The Foundry Toolkit parses `agent.yaml` (AgentDefinition) directly, `azd ai agent` use an AgentManifest and generates an AgentDefinition, but has to also mix orchestration with azure.yaml. They feel like separate experiences.
5. **The manifest layer carries no weight.** `agent.manifest.yaml` was designed for an agent catalog that didn't get built. The templating it adds isn't paying for itself.
6. **No real ability to share** AgentDefinitions were intended to be concrete definitions, but in practice any real values get abstracted with AZD environments (`${ENV_VAR}`) effectively becoming an templated definition, which confused the purpose of an AgentManifest.

## Solution Hypothesis

The shape we want is:

- A single `host: azure.ai.project` service owns all Foundry data-plane state that can't be modeled in ARM/Bicep. Today that includes toolboxes and connections; future additions (eval datasets, vector indexes, fine-tunes) go here too. The "project" maps directly to the Foundry entity these resources belong to. No per-resource-type host proliferation (no `azure.ai.toolbox`, `azure.ai.connection`, etc.).
- `host: azure.ai.agent` describes the agent runtime. The `config:` block maps to the Foundry create-agent API (kind, description, metadata, protocols, container resources, env, startupCommand). Agents reference the project service via `uses: [foundry-project]`.
- Deploy mode is explicit: if a `docker:` block is present, container mode. If a `runtime:` block is present, code-deploy mode. If neither: validation error. If both: validation error. No silent defaults.
- The `runtime:` block follows the existing azure.yaml schema precedent (`runtime: { stack: python, version: "3.13" }`), not a bare string.
- No Bicep files in the repo by default. The extension carries built-in Bicep templates internally (like AZD compose) and generates them in memory during `azd provision`. Developers can opt into Bicep on disk via `azd infra gen` or equivalent. The composition mechanism of add/remove to the YAML is tracked in a separate RFC (not yet filed).
- Service ordering uses azd's existing `uses` field. `uses` is the inter-service dependency primitive in `ServiceConfig` today; no schema addition needed.
- `agent.yaml` and `agent.manifest.yaml` go away.

The mental model shifts from "one big agent blob with everything inside it" to "a Foundry project that owns shared resources, plus agent services that reference it."

## Required `azure.yaml` Schema Changes

azd would need to recognize one new host kind under `services.<name>.host`: `azure.ai.project`. That would result in these services:

| Host kind | Owns | Provisioning verb | Deploy verb |
|---|---|---|---|
| `azure.ai.agent` (already exists) | Agent runtime (container or code-deploy) | (none -- needs Foundry project to exist) | Push agent definition + container/zip |
| `azure.ai.project` (new) | All Foundry data-plane state (toolboxes, connections, model deployments, future resources) | Create Foundry project (ARM) | Create/update data-plane resources via Foundry APIs |

Each host kind owns its own JSON schema for the `config` block. The schemas would live in the `azure.ai.agents` extension like the existing one already does.

`azure.ai.project` is a service without source code -- it has no `project:` directory, no build step, no artifact. Its `config:` block declaratively describes the Foundry data-plane state. `provision` creates the ARM-level Foundry project; `deploy` creates or updates toolboxes, connections, and other data-plane resources via Foundry APIs.

## Final Shape: `azure.ai.agent.json` (after the change)

The agent runtime schema shrinks substantially once the project-scoped fields move to `azure.ai.project`:

```jsonc
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Azure AI Agent Runtime",
  "description": "Configuration for a hosted agent runtime in a Foundry project.",
  "type": "object",
  "properties": {
    "kind":         { "type": "string", "enum": ["hosted", "prompt"] },
    "description":  { "type": "string" },
    "metadata":     { "type": "object", "additionalProperties": { "type": "string" } },
    "protocols": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "protocol": { "type": "string" },
          "version":  { "type": "string" }
        },
        "required": ["protocol", "version"]
      }
    },
    "container": {
      "type": "object",
      "description": "Container resources. Only relevant when the service has a docker: block (container mode).",
      "properties": {
        "resources": {
          "type": "object",
          "properties": {
            "cpu":    { "type": "string", "pattern": "^[0-9]+(\\.[0-9]+)?m?$" },
            "memory": { "type": "string", "pattern": "^[0-9]+(\\.[0-9]+)?(Ki|Mi|Gi|Ti|Pi|Ei|k|M|G|T|P|E)?$" }
          }
        }
      }
    },
    "env": {
      "type": "object",
      "additionalProperties": { "type": "string" }
    },
    "startupCommand": { "type": "string" }
  },
  "additionalProperties": false
}
```

Removed from this schema: `deployments[]`, `resources[]`, `toolConnections[]`, `toolboxes[]`, `connections[]`. These all move to the `azure.ai.project` service. Also removed: `runtime` and `entrypoint` as config-level fields -- runtime mode is now expressed at the service level via a typed `runtime:` block (see below).

The `config:` block intentionally maps closely to the Foundry create-agent API contract -- it describes what the agent IS to Foundry. Deploy mode is determined at the service level:

- **Container mode** -- the service has a `docker:` block. azd builds and pushes via the existing `docker.path` and `docker.remoteBuild` fields on `ServiceConfig`. Same packaging flow as any other containerized azd service.
- **Code-deploy mode** -- the service has a `runtime:` block at the service level (not inside `config:`). This follows the [existing `runtime` definition in azure.yaml schema](https://github.com/Azure/azure-dev/blob/991b9743b4ab14ee8561677ffa627735c875a796/schemas/v1.0/azure.yaml.json#L1468-L1492):
  ```yaml
  runtime:
    stack: python
    version: "3.13"
  ```
  azd zips the project directory and Foundry schedules it on the appropriate managed base image.
- **Validation rules**: `docker:` and `runtime:` are mutually exclusive (both present = validation error). Neither present = validation error. No silent defaults -- they cause debugging nightmares at deploy time.

## New Sibling Schema: `azure.ai.project.json` (sketch)

`azure.ai.project` -- one Foundry project's data-plane state. Owns all project-scoped resources that can't be modeled in ARM/Bicep:

```jsonc
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Azure AI Foundry Project",
  "description": "Project-scoped Foundry data-plane resources.",
  "type": "object",
  "properties": {
    "toolboxes": {
      "type": "object",
      "description": "Named toolboxes. Each key is the toolbox name.",
      "additionalProperties": {
        "type": "object",
        "properties": {
          "tools": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "type":       { "type": "string", "description": "e.g., web_search, mcp, code_interpreter" },
                "connection": { "type": "string", "description": "Connection name or env var ref for connection-backed tools like mcp." }
              },
              "required": ["type"]
            }
          }
        }
      }
    },
    "connections": {
      "type": "array",
      "description": "Foundry project connections.",
      "items": {
        "type": "object",
        "properties": {
          "name":     { "type": "string" },
          "category": { "type": "string" },
          "target":   { "type": "string" },
          "authType": { "type": "string" }
        },
        "required": ["name", "category", "target"]
      }
    },
    "deployments": {
      "type": "array",
      "description": "Model deployments.",
      "items": {
        "type": "object",
        "properties": {
          "name":  { "type": "string" },
          "model": { "type": "object" },
          "sku":   { "type": "object" }
        },
        "required": ["name", "model"]
      }
    }
  },
  "additionalProperties": true
}
```

`additionalProperties: true` leaves room for future project-scoped resources (eval datasets, vector indexes, knowledge sources) without schema-breaking changes.

## Example `azure.yaml` After the Change

```yaml
services:
  # Project-scoped: all Foundry data-plane resources in one place.
  # No source directory, no build artifact -- pure declarative state.
  foundry-project:
    host: azure.ai.project
    config:
      deployments:
        - name: gpt-4.1-mini
          model: { format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14" }
          sku: { name: GlobalBatch, capacity: 10 }
      connections:
        - name: github-mcp-conn
          category: CustomKeys
          target: https://api.githubcopilot.com/mcp
          authType: ApiKey
      toolboxes:
        agent-toolbox:
          tools:
            - { type: web_search }
            - { type: code_interpreter }
            - { type: mcp, connection: ${GITHUB_MCP_CONN} }

  # Agent-scoped: the runtime, references the project via uses
  my-agent:
    project: src/my-agent
    host: azure.ai.agent
    uses: [foundry-project]
    # Code-deploy mode: runtime block present -> zip-deploy
    runtime:
      stack: python
      version: "3.13"
    config:
      kind: hosted
      description: A basic agent hosted by Foundry.
      protocols:
        - { protocol: responses, version: 1.0.0 }
      env:
        AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini
      startupCommand: python main.py

  # Container mode variant: replace runtime with docker
  #
  #   my-agent:
  #     project: src/my-agent
  #     host: azure.ai.agent
  #     uses: [foundry-project]
  #     docker: { path: Dockerfile, remoteBuild: true }
  #     config:
  #       kind: hosted
  #       protocols: [{ protocol: responses, version: 1.0.0 }]
  #       container:
  #         resources: { cpu: "0.25", memory: 0.5Gi }
  #       env:
  #         AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini

# No infra: block needed by default.
# azd provision uses built-in Bicep templates internally (like AZD compose).
# Opt in to Bicep on disk via azd infra gen or equivalent (separate RFC).
```

A second agent that wants the same toolbox is just another `host: azure.ai.agent` entry with `uses: [foundry-project]` -- nothing duplicated. The project service is the single source of truth for shared resources.

## Dependency Flow

The pattern uses azd's existing mechanisms, no new wiring needed:

1. **`azure.ai.project` provisions first.** `azd provision` creates the Foundry project (ARM). `azd deploy` then deploys the `foundry-project` service, which creates data-plane resources (toolboxes, connections, model deployments) via Foundry APIs. These resources are declared in the project's `config:` block.
2. **`uses` orders services.** The agent declares `uses: [foundry-project]`, so the project service deploys before any agent. `uses` is azd's existing service-to-service dependency primitive on `ServiceConfig` -- it controls deploy ordering and surfaces the dependency's outputs as env vars on the dependent service.
3. **Env var expansion uses `${VAR}` syntax.** Same mechanism azure.yaml already supports. Connection references in toolbox configs (e.g., `connection: ${GITHUB_MCP_CONN}`) are resolved from the azd environment at deploy time.

So the chain is: **provision (ARM resources) -> project deploy (data-plane resources) -> agent deploy (agent definition + code/container)**. Each step uses primitives azd already supports.

No Bicep files are required in the repo. The extension handles ARM provisioning internally, generating Bicep from the azure.yaml state (similar to AZD compose). Developers who need explicit Bicep on disk can eject via `azd infra gen` or equivalent -- that mechanism is covered by a separate RFC (not yet filed).

## Criteria

**Belongs in `services.<agent>.config` (host: `azure.ai.agent`):** kind, description, metadata, protocols, container resources, env, startupCommand. The `config:` block maps to the Foundry create-agent API contract.

**Belongs at the service level (existing `ServiceConfig` fields, no additions):** `docker:` (container packaging via `docker.path` and `docker.remoteBuild`), `runtime:` (code-deploy via typed `{ stack, version }` block), `uses:` (service ordering plus env-var injection from dependencies), and `project:` (source directory). Container vs. code-deploy is discriminated by the presence of `docker:` vs. `runtime:` -- mutually exclusive, and at least one is required (validation error otherwise).

**Belongs in `services.<project>.config` (host: `azure.ai.project`):** All Foundry data-plane resources that are project-scoped: toolboxes, connections, model deployments, and future resources like eval datasets and vector indexes. One `azure.ai.project` service per project. This is a service without source code -- no build, no artifact. Its `config:` block is pure declarative state.

**Does NOT belong in `azure.yaml`:** model selection at request time, tool implementations, instructions and prompts (all in agent code); secrets (`.env`, Key Vault, or azd environment); cross-environment endpoint values (azd environment, `.azure/{env}/`).

The principle we're aiming for: `azure.yaml` describes what exists in the Foundry project and how the agent runs. Agent code defines what the agent does. The azd environment carries deployment-target values. Bicep is opt-in for developers who need full IaC reproducibility.

## Downstream Impact

- **`azure.ai.agents` extension** picks up the new `azure.ai.project` host kind alongside the existing `azure.ai.agent`. Its deploy hook reads project and agent definitions from `azure.yaml`, substitutes env vars for connection references, and calls the Foundry APIs. `init` stops emitting `agent.yaml` and `agent.manifest.yaml`, with a fallback to the old files for one deprecation window. No Bicep files are generated by default.
- **Foundry Toolkit for VS Code** drops its `agent.yaml` parser and reads/writes `azure.yaml` instead.
- **Samples** no longer rely on an AgentManifest. Project-level resources (models, connections, toolboxes) are declared in the `azure.ai.project` service. No Bicep required for the default path.
- **Bicep-less provisioning** requires a separate RFC (not yet filed) defining how the extension generates ARM templates internally from azure.yaml state, similar to AZD compose.

## Alternatives Considered

### Make `agent.yaml` the source of truth; introduce `azure.yaml` only on opt-in

The natural inversion: keep `agent.yaml` as the AgentDefinition (redesigned to absorb the runtime config this proposal puts in `services.<name>.config`) and have `azd ai agent init / deploy / invoke` operate directly on it via a `--project-endpoint` flag. `azure.yaml` shows up only when the developer opts into broader AZD features -- multi-service orchestration, Bicep, environments, CI/CD -- at which point an `azure.yaml` service block points at the existing `agent.yaml`. AgentManifest stays parked for a future catalog.

What it gets right:

- A startup developer can go `init -> deploy -> invoke` without ever touching `.azure/`, `azure.yaml`, or infra. Lines up with the primary persona in `framing.md`.
- Foundry Toolkit for VS Code cutover is cheap -- both CLI and Toolkit read the same per-agent file. No `azure.yaml` schema work for the Toolkit team.
- Matches the dominant competitor mental model. AgentCore CLI is single-file standalone; Claude is API-only. Microsoft stops being the outlier asking developers to learn an orchestration framework first.
- Schema ownership is cleaner. `agent.yaml` is "what the agent IS to Foundry"; `azure.yaml` is "how azd orchestrates." Less cross-team negotiation when either side evolves.

Why it doesn't hold up:

`agent.yaml` is per-agent by definition. Anything project-scoped -- toolboxes today, plus future shared concepts like knowledge indexes -- has nowhere good to live. The three sub-options are all bad:

1. **Inline in every `agent.yaml`.** Each agent redeclares its shared toolbox. Reintroduces the "no sharing across agents" problem from Current Problems, just in a different file.
2. **Invent a higher-level Foundry config (e.g., `foundry.yaml`).** Three files again -- `agent.yaml`, `foundry.yaml`, `azure.yaml` -- with overlap potential. Worse than today.
3. **Make project-scope an AZD-only capability.** Forces developers with even two agents that share a toolbox to opt into AZD, which defeats the whole point of the alternative.

The agentcore comparison sharpens the mismatch. `agentcore.yaml` works as a single standalone surface *because it IS the project-level container* -- `runtimes` (multiple agents), `mcpRuntimeTools`, `memories`, `credentials`, `gateways` all sit at the same top level. `agent.yaml`'s per-agent scope has no equivalent. In our world, `azure.yaml` already plays the project-level container role; pushing that responsibility down into a per-agent file doesn't fit. The `azure.ai.project` service is explicitly this project-level container -- it maps to the Foundry project entity and owns all shared state.

### Hybrid: keep `agent.yaml` for the per-agent definition, `azure.yaml` for orchestration only

A softer variant: keep `agent.yaml` as the portable per-agent definition (what Foundry Toolkit reads) and let `azure.yaml` carry only project-level orchestration -- a *thin* service block that references `agent.yaml` and adds packaging plus `uses`:

```yaml
services:
  my-agent:
    project: src/my-agent
    host: azure.ai.agent
    docker: { path: Dockerfile }
    uses: [agent-toolbox]
    # No config: block. Agent definition lives in src/my-agent/agent.yaml.
```

This dodges the duplication failure mode by structurally separating concerns: `agent.yaml` carries Foundry-create-agent fields; the service block carries azd packaging and orchestration. Rejected because:

- It keeps two parallel deploy code paths alive (standalone reads `agent.yaml`; AZD-mode reads `azure.yaml` + `agent.yaml`), each with its own schema discipline and edge cases.
- Keeping the service block free of agent-definition fields requires permanent schema vigilance; easy to violate as new features land.
- The win of "no `azure.yaml` in the root" is mostly perceptual. This proposal's standalone path is already `azure.yaml` + agent code -- no `.azure/`, no `infra/`. File count and learning curve match; only the filename differs.
- Foundry Toolkit alignment is a real cost in this proposal, but bounded -- VS Code already understands `azure.yaml` for other azd features, and the parser change is a one-time migration.

## Migration Path

1. azd schema recognizes the new `azure.ai.project` host kind. All other primitives (`docker:`, `runtime:`, `uses:`, `project:`, `language:`) are reused from the existing `ServiceConfig` -- no new top-level service fields, no core schema change beyond the host kind.
2. `azure.ai.agents` extension ships the two schemas (`azure.ai.agent.json`, `azure.ai.project.json`) and deploy logic for both host kinds.
3. `init` generates the consolidated `azure.yaml` (with `azure.ai.project` + `azure.ai.agent` services) and stops emitting `agent.yaml` / `agent.manifest.yaml`. No Bicep files generated by default.
4. Deploy hook reads from `azure.yaml`, falling back to `agent.yaml` during the deprecation window.
5. Foundry Toolkit for VS Code switches its parser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unify Foundry agent configuration in azure.yaml #7962

Current Problems

Solution Hypothesis

Required `azure.yaml` Schema Changes

Final Shape: `azure.ai.agent.json` (after the change)

New Sibling Schema: `azure.ai.project.json` (sketch)

Example `azure.yaml` After the Change

Dependency Flow

Criteria

Downstream Impact

Alternatives Considered

Make `agent.yaml` the source of truth; introduce `azure.yaml` only on opt-in

Hybrid: keep `agent.yaml` for the per-agent definition, `azure.yaml` for orchestration only

Migration Path

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Host kind	Owns	Provisioning verb	Deploy verb
`azure.ai.agent` (already exists)	Agent runtime (container or code-deploy)	(none -- needs Foundry project to exist)	Push agent definition + container/zip
`azure.ai.project` (new)	All Foundry data-plane state (toolboxes, connections, model deployments, future resources)	Create Foundry project (ARM)	Create/update data-plane resources via Foundry APIs

Uh oh!

Unify Foundry agent configuration in azure.yaml #7962

Description

Current Problems

Solution Hypothesis

Required azure.yaml Schema Changes

Final Shape: azure.ai.agent.json (after the change)

New Sibling Schema: azure.ai.project.json (sketch)

Example azure.yaml After the Change

Dependency Flow

Criteria

Downstream Impact

Alternatives Considered

Make agent.yaml the source of truth; introduce azure.yaml only on opt-in

Hybrid: keep agent.yaml for the per-agent definition, azure.yaml for orchestration only

Migration Path

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Required `azure.yaml` Schema Changes

Final Shape: `azure.ai.agent.json` (after the change)

New Sibling Schema: `azure.ai.project.json` (sketch)

Example `azure.yaml` After the Change

Make `agent.yaml` the source of truth; introduce `azure.yaml` only on opt-in

Hybrid: keep `agent.yaml` for the per-agent definition, `azure.yaml` for orchestration only