Learning about Multi-Agent AI - Specifically Foundry & becoming Production-Ready

Lets get a little bit deeper on AI Agents, it never hurts to understand what the pro-code experts can do!

Dec 15, 2025

I’ve seen a few AI demos that look brilliant… right up until you ask them to do something messy and real. (The Demo Gods are REAL) The truth is, a single large language model (LLM) answering questions isn’t the same thing as an agent getting work done. That’s where agents, and especially multi-agent systems, start to matter.

When you pair agents with a platform like Foundry, the conversation shifts from “can it generate text?” to “can it run safely, scale cleanly, and be operated like any other service?”

Quick caveat: I’m not an expert in agents or Foundry — I’m a critical thinker, who’s curious, slightly sceptical, and trying to learn out loud. I’m sharing how I’m thinking about multi-agent systems as I explore the space, “I intend to apply to security agents” with a focus on what feels practical (and what still feels fuzzy). If you’re deeper in this world, I’d genuinely welcome your views, and if there are experts, papers, talks, or real-world deployments you think I should be reading, please point me in the right direction via comments!

Most don’t need this level of platform thinking yet — and that’s fine. But architecturally, it’s genuinely fascinating.

Agents are more than chat: they reason, act, and remember

An agent isn’t just a prompt with a personality. It’s typically a combination of:

A model that can reason through a problem
Tools it can call to take action (query data, run code, trigger processes)
Memory or context so it can keep track of what matters across steps

That mix is what makes agents useful in day-to-day IT and security work. Instead of asking for an answer, you’re asking for an outcome.

Why one agent quickly hits a wall

Single agents work fine for simple tasks. But once you ask for anything that involves multiple steps, pulling data, cleaning it up, analysing it, then writing it out clearly, - one agent starts to behave like one person trying to do five jobs at the same time.

This is where multi-agent orchestration earns its keep. You split the workload into specialists, then coordinate them so the overall flow stays clear and repeatable. Typical roles look like:

A coordinator that plans and delegates
One or more retrieval agents that gather the right inputs
A preprocessing agent that turns messy data into clean structure
An analysis agent that does the heavy thinking (and sometimes generates code)

The big win isn’t “more AI”. It’s less chaos, because each agent has a clear job.

The reliability trick: deterministic workflow, flexible intelligence

Here’s the part I keep coming back to: LLMs are non-deterministic. Ask the same thing twice and you may not get the exact same output. That’s normal, but it’s awkward when you’re trying to run operations.

A strong approach is to put the agent inside a deterministic workflow:

The workflow is predictable: step 1, step 2, step 3
The agents are flexible: they can reason and adapt inside each step

That combination gives you something that’s far easier to trust. You get the creativity and problem-solving of the model, without sacrificing the structure you need for production.

Microsoft Agent Framework: a single direction for building agents

Microsoft Agent Framework is the recommended path for building new pro-code agents. It’s also the result of converging the SDK work that previously lived across Semantic Kernel and AutoGen.

From reading the first page, this is geared around,

Building agents with reasoning, tools, and memory
Building orchestrated workflows that keep outcomes consistent

Agent Framework is in public preview. Use the link to take a read.

No-code AI development platforms
No-code AI platforms let you build and tailor AI agents without writing code. They usually come with a conversational interface and drag-and-drop building blocks. You can create an agent from scratch using agentic workflows, or start from a ready-made template designed for common scenarios.
To shape what the agent can do, you visually connect different blocks — for example natural language understanding, decision logic, and data integrations. This makes it easier to prototype and iterate quickly.
Templates go one step further for clearly defined tasks. In most cases you pick a template, provide your training data, and deploy the agent into the ecosystem you want, all without needing to code.
Pro-code AI agent development platforms
Pro-code platforms follow a traditional development approach, where developers handle the full build process in code. It’s more hands-on, but it gives you complete flexibility and control over how the agent behaves.
You typically use mainstream languages like C++, Python, or JavaScript (along with their libraries) to build agents from the ground up. The platform’s role is mainly to bring the tools, frameworks, and deployment resources together in one place.

Foundry Agent Service: where agents become something you can actually run

Can we call this a new acronym, FASAAS? Building an agent is one thing. Running it like a service is another.

Hosted Agents in Foundry Agent Service is positioned as the “run and operate” layer — aimed at deploying pro-code agents and managing them with the kind of controls you’d expect in an enterprise environment.

The capabilities are the sort of things ops teams ask for immediately:

Managed runtime with serverless auto-scale and dedicated execution environments
Context management via a Responses API
Versioning and lifecycle management
Integrated tools and data, including managed authentication for remote MCP servers
Observability: traces, metrics, debug logs, and cost analysis
Evaluation and governance: continuous evaluations and guardrails via the Foundry control plane
Interoperability: paths into broader Microsoft ecosystems (including activity protocol support)
Enterprise security and data sovereignty, including bring-your-own virtual network and state storage options

If you’ve ever had to explain to leadership why a “quick AI pilot” needs controls, this is the part of the story that helps.

Seeing the workflow matters more than most people expect

One of the most practical pieces of the agent experience is simply being able to see what happened.

A developer UI that visualises workflows — showing the agent map, the step timeline, and when human input is required — turns an “AI black box” into something you can reason about. It’s also much easier to troubleshoot, because you can trace:

Which step ran
Which tool was called
Which model was used
What succeeded, what failed, and where it got stuck

That’s not a “nice to have”. It’s how multi-agent systems become operable.

A simple example: planning work without drowning in details

A good way to picture multi-agent orchestration is a planning workflow.

You might have a coordinator agent that breaks a task into streams like venue, budget, catering, logistics — then delegates each stream to a specialist agent. The system can pause when it needs a human decision (like choosing a city), then continue once that input arrives.

Now imagine this for a SOC analyst!

It’s a small example, but it mirrors real work: most operational tasks aren’t hard because they’re complex — they’re hard because they’re multi-step and interdependent.

Resilience: durable agents that don’t lose their place

Multi-step work breaks. Services restart. Humans take time to reply.

That’s why the durable task extension matters. The idea is to create checkpoints as the workflow runs, so if something fails you restart from the last checkpoint rather than starting from scratch.

This becomes even more valuable when the workflow includes human-in-the-loop steps that might pause for minutes… or days.

Governance: bringing policy into the agent loop with Purview

If you’re serious about running agents at scale, you need to be serious about data.

A practical approach shown here is injecting Microsoft Purview policy middleware into agents so requests and responses can be inspected, logged, and blocked when sensitive data is involved. That activity can then surface in Purview’s Activity Explorer, including detections for sensitive information.

A blog post coming on this when i understand exactly how it’s possible!

Seven practical considerations before you go anywhere near production with any sort of multi-agent flow!

Design agent roles first
Split work into retrieval, preprocessing, analysis, and orchestration before you write prompts.
Keep the workflow predictable
Let agents reason inside steps, but keep the step order consistent so outcomes stay steady.
Make it observable from day one
If you can’t trace steps, tool calls, and decisions, you’ll struggle to support it under pressure.
Treat data prep as a first-class problem
Raw inputs are rarely ready. Plan for preprocessing as its own responsibility.
Bake policy into the flow
If sensitive data matters (and it does), use middleware and auditing early, not as an afterthought.
Design for failure and restart
Checkpoints and durable orchestration turn outages into recoverable interruptions.
Own versioning and lifecycle
If you can’t compare versions, roll forward safely, and manage deployments cleanly, you don’t really have a platform.

Wow that was alot!

Agents are moving past “chat that sounds clever” and into “systems that do work”.

You probably don’t need this level of platform thinking yet — and that’s fine. But architecturally, it’s genuinely fascinating.

Foundry’s real value isn’t in flashy prompts or clever demos. It’s in the unglamorous stuff: making agent-driven work deployable, observable, governable, and repeatable. Those “boring bits” are exactly what turns AI from something you try once into something you can run every day without holding your breath.

Unless you’re already deep in this world, you’ll usually get further, faster by starting low-code or no-code. It’s the quickest way to prove value, learn what’s possible, and spot what actually matters in your environment. That said, it never hurts to understand what the pro-code experts can do — because sooner or later, you’ll hit a point where flexibility, control, or scale becomes the real requirement.

My takeaway

If you’re exploring this space, here’s the question I’d sit with: where is your team losing time turning messy inputs into something usable — and what would change if a multi-agent workflow handled that reliably, with clear visibility and the right controls built in? Do you need an agent or just a Logic App?

The Hype is real. Hopefully, some real problems will be solved with some of this new tech!

More thoughts soon,

Marcus

Thanks for reading Marcus Burnap - Microsoft Security - Sentinel - Defender XDR! This post is public so feel free to share it.

Tags #

#MicrosoftSecurity

#MicrosoftLearn

#CyberSecurity

#MicrosoftSecurityCopilot

#Microsoft

#MSPartnerUK

#msftadvocate

WordCloud #

Secure AI Foundry
Secure AI Agent Foundry
AI Security Foundry
Secure Agentic AI
Enterprise AI Agent Security
Secure AI Operations
Secure AI at Scale
AI Safety by Design
AI-SecOps
AI-native SecOps
Microsoft Defender for AI agents
Microsoft Defender for AI security
Microsoft Agent 365 security
Secure Microsoft AI agents
Microsoft AI agent posture management
Defender AI agents Ignite
Microsoft secure AI development
Azure AI Foundry security
Secure Azure AI Foundry patterns
Azure AI agent security best practices

The AI Architect

Dec 16

This factory analogy for multi-agent systems feels really on point. Treating agents as production components with deterministic workflow boundaries instead of alot of conversational models changes the whole reliability game. In a past project with independant services, we spent more time on observability and failure tracing than building the actual agents, so that separation matters way more than people think. How do you think about scaling the agent-platform relationship once the number of active agents increases?

1 reply by Marcus Burnap

Rainbow Roxy

Dec 15

Regarding the topic of the article, it makes me think about how complex plots unfold in my favorite novels, where each character acts like an independant agent remembering past events to drive the story forward; so fascinatin to see that parallel in AI development.

1 more comment...

Marcus Burnap - Microsoft Security -AI Sentinel Defender XDR

Discussion about this post

Ready for more?