Microsoft Foundry
Enterprise AI Upskilling

Microsoft
Foundry

One platform for enterprise AI: models, agents, tools, knowledge, and the control plane that governs them.

Jon-Paul Boyd · June 2026
00 · Orientation

Something for everyone

DEVELOPERS Builders
  • Choose and compare models: catalog, leaderboard, side by side
  • Prototype fast in model and agent playgrounds
  • Build agents with tools, MCP, memory, human in the loop, SDKs
  • Ground on Foundry IQ; evaluate, trace and fine-tune
IT ADMINS Platform owners
  • One Foundry resource, many projects: 1:1 and 1:N patterns
  • Hub and spoke routing through an APIM gateway
  • RBAC roles, regions, quota and tokens-per-minute limits
  • Cost overview and tagging; provision with Bicep and Terraform
SECURITY Risk & compliance
  • Guardrails: Prompt Shields, content filters, PII blocklists
  • Red teaming: AI Red Teaming Agent on PyRIT, attack success rate
  • Governance: Azure Policy deny rules on model deployments
  • Control plane: Entra Agent ID, Defender, Purview, continuous evals
1 What it is 2 Portal 3 Deploy 4 Inference 5 Agents 6 Foundry IQ 7 Control plane 8 Wrap

Build for business value outcomes, then govern it at scale.

01 · What it is

One platform

◆ Microsoft Foundry

Observability

Tracing · Monitoring Evaluation · Experimentation

Frameworks & SDKs

Foundry SDKAgent Framework LangGraphLangChain LlamaIndexCrewAI GitHub Copilot SDK

Foundry Agent Service the runtime

Managed orchestrationConversations + state Network isolationOBO auth

Models

Microsoft & partners Fine-tuned Bring your own

Knowledge & Tools

Azure AI SearchBing SharePointFabric FunctionsLogic Apps OpenAPIMCP

Agent integrations

Foundry SDKREST Responses APIA2A protocol

Guardrails, security & governance

Prompt ShieldsContent filters PII detectionCustom blocklists Protected materialPolicy & compliance Cost managementEntra · Defender · Purview
BUILD Multi-agent · 1,400+ tools · Agent memory · Foundry IQ
OPERATE Observability · Agent 365 identity · Fleet ops

Unified breadth: build with agents, tools, memory and knowledge, then operate them with identity, observability and fleet governance, all behind one resource.

02 · Portal

The portal is organised around jobs to be done

Microsoft Foundry / proj-foundry-core Search with AI (Ctrl + K) New Foundry HomeDiscoverBuildOperateDocs ☼ ✉ ☉
Welcome back
Start building ›
Project endpoint proj-foundry-core.services.ai.azure.com Region Sweden Central API key auth disabled
HomeLand, resume recent work, quick start.
DiscoverNetflix-style catalog of models and tools.
BuildCreate agents, apps and workflows.
OperateAdmin and fleet view across projects.
DocsDocumentation, in context, never leave.

Flip on New Foundry at ai.azure.com and you get five tabs by job. Everything sits inside a Foundry resource (a subscription + resource group: the billing and governance boundary).

03 · Discover

A model catalog you can compare, then deploy

Microsoft Foundry / proj-foundry-core Search with AI (Ctrl + K) New Foundry HomeDiscoverBuildOperateDocs ☼ ✉ ☉
Discover what's possible1,900+ models · explore by provider, collection, leaderboard
Azure OpenAIAnthropicMicrosoftMetaMistral AIxAIDeepSeek
Model leaderboardQualitySafetyThroughputEst. cost
gpt-5.2-codex 0.930.18%32 t/s$4.81
gpt-5.20.931.87%60 t/s$4.81
claude-opus-4-60.932.41%43 t/s$10.00
claude-sonnet-4-60.922.19%61 t/s$6.00
grok-40.911.41%48 t/s$5.20
Tools view: Foundry Tools catalog: remote & local MCP servers, OpenAPI and A2A. Configure once, add to any agent or workflow.

Optimise the choice across quality, safety, throughput and cost, compare models side by side, then quick-deploy (global standard) straight from the card.

Live demo

Compare and deploy from the catalog

Discover → leaderboard → compare → quick-deploy, then the Foundry Tools / MCP catalog.

in the portal · ai.azure.com → Discover
04 · Build

Where developers create and manage every asset

Microsoft Foundry / project-admin-c2676f Search with AI (Ctrl + K) New Foundry HomeDiscoverBuildOperateDocs ☼ ✉ ☉
Agents Deployments Fine-tune Tools Knowledge Memory Data Evaluations Guardrails
New Foundry supports V2 agents only. Classic agents and the Assistants API are not carried over: save them as new agents.
Browse templates Code agent Create agent
NameVersionTypeDescription
contoso-bank-agent1promptCustomer-service agent
contoso-pmo-agent1promptPMO knowledge-base agent
aria-rm-briefing-agent1promptPrivate-banking RM briefing

Build is where developers manage every asset they deploy or create, and it is the home of the model and agent playgrounds.

05 · Operate

And one tab to govern it all

Microsoft Foundry / fleet Search with AI (Ctrl + K) New Foundry HomeDiscoverBuildOperateDocs ☼ ✉ ☉
PILLAR 1ControlsRuntime guardrails on inputs, outputs and tool calls.
PILLAR 2ObservabilityTracing, continuous evaluation, cost.
PILLAR 3SecurityEntra Agent ID, Defender, Purview.
PILLAR 4Fleet opsOne view across every project and cloud.

Operate is the Foundry Control to see, govern and act on every agent across the fleet.

06 · Pattern

One resource, many projects, two ways to slice it

Subscription Resource group Foundry resource Project(s)
1 : 1 · dedicated one account, one team: a hard infrastructure boundary
◆ aif-spoke-alpha
project-alpha
For strict compliance or cost isolation, and to separate dev / test / prod.
1 : N · shared one account, many isolated project workspaces
◆ aif-spoke-multi
project-betaproject-deltaproject-gamma iqobscu
For teams on the same cost centre: cost-efficient, shared RBAC, per-project isolation.

The Foundry resource is the billing, networking and quota boundary.
The project is the isolated workspace (its own agents, data and connections).

07 · Architecture

Centralise inference behind one AI gateway

Models live only in the 'hub' Foundry accounts*
◆ aif-core
East US 2 · general
gpt-4.1-miniembeddings
◆ aif-research
Norway East · reasoning
o3-deep-research
◆ aif-oss
West US 3 · open-weights
Phi-4
managed identity · routed by URL (most specific wins)
⇉ APIM gateway · apim-foundry
swaps client key → managed-identity token · per-team rate limits & quota · routes by URL to hubs
each project: its own connection core-{team} + gateway key
Spoke 1 : 1 no models · deny policy
◆ aif-spoke-alpha
project-alpha
reaches models via the core-alpha connection
Spoke 1 : N no models · deny policy
◆ aif-spoke-multi
betadeltagammaiqobscu
each project: own key, own quota
* 'Hub' here means the central model-hosting Foundry accounts in this topology, not the legacy Foundry v1 'hub' resource type.

One Azure API Management gateway fronts every model. Spokes hold zero deployments: they call the hub through a per-team key, so cost, content filters and observability stay unified.

08 · Governance

Make the pattern impossible to break

// Azure Policy: deny-model-deployments
{
  "if": {
    "field": "type",
    "equals": "Microsoft.CognitiveServices/accounts/deployments"
  },
  "then": { "effect": "deny" }
}
rg-foundry-coreEXEMPT
rg-foundry-spoke-alphaDENIED
rg-foundry-multiDENIED
Deploy a model into a spoke and Azure returns
RequestDisallowedByPolicy. Spokes can still use models through the gateway, just never deploy their own.

A one-rule Azure Policy denies Microsoft.CognitiveServices/accounts/deployments in spoke resource groups. The hub stays exempt, so the architecture cannot drift.

09 · Deploy

Deploying a model: pick a type, get an endpoint

Standard pay per token
  • Elastic, no commitment, billed per token
  • The default: global standard, one click
  • Best for spiky or early workloads
Provisioned / PTU reserved capacity
  • Predictable latency and throughput
  • One PTU pool can be shared across different provisioned models
  • Best for consistently high utilisation, latency-sensitive production
Global
max throughput, data may leave region
Data Zone
stays within a geography (EU / US)
Regional
pinned to one region for residency
Defaults vs custom: accept global standard + default quota, or customise SKU, quota (TPM) and guardrails. Partner models (Llama, Claude) need an Azure Marketplace subscription; models sold directly by Azure do not.

A deployment is a model + a deployment type. The type sets the cost model (per-token vs reserved) and the data-residency and throughput guarantees; quota is tracked per region and subscription, as PayGo (standard) or PTU (provisioned).

Live demo

Model Deployment Playground

Set the system prompt, attach web search → Monitor for tokens, cost and latency.

in the portal · ai.azure.com → Build → Deployments → Playground
10 · Endpoints

One resource, three endpoint surfaces

OpenAI SDK
*.openai.azure.com/openai/v1
Full OpenAI API surface: chat completions, embeddings, Responses, fine-tuning. No agents or evaluations.
key or token
Foundry SDK
*.services.ai.azure.com/api/projects/*
Foundry-native: agents, evaluations, connections, tracing. Responses API on its /openai route.
token only · Entra
Foundry Tools SDKs
*.cognitiveservices.azure.com
The other AI services: Speech, Vision, Language, Content Safety (formerly Azure AI Services).
key or token
One Foundry resource exposes all three · an Azure OpenAI resource has only /openai/v1

A Foundry resource is multi-surface: the OpenAI endpoint for raw inference, the project endpoint for Foundry-native agents and evals (token-only via Entra), and Cognitive Services for the other AI tools. Match the SDK to the endpoint.

11 · Inference

Calling models through the gateway

Direct client gateway key
AzureOpenAI(base_url=gateway, api_key=key)
Chat completionsEmbeddingsDeep research
Full Azure OpenAI surface; the team holds the gateway key.
Foundry project client keyless
responses.create(model="core-alpha/gpt-4.1-mini")
Responses APIMulti-turnStreaming
Keyless via Entra; routes through the project's core-{team} connection, which speaks the Responses API only.
↓ both paths reach the models through one APIM gateway · managed identity onward ↓

Pick by surface and auth: the direct client (gateway key) for the full OpenAI surface, or the Foundry project client (keyless via Entra) whose core-{team} connection speaks the Responses API. Both reach the same models through the one gateway.

Live demo

Model inference, end to end

The direct AzureOpenAI client and the Foundry AIProjectClient → chat, embeddings, deep research → the Responses API, multi-turn and streaming.

12 · Agents

An agent is a model, instructions and tools

Input
User messages
System events
Agent messages
Agent
LLM
Instructions
Tools
Output
Agent messages
Structured output
↓ tool call     ↑ result
Tool calls
Retrieval
Actions
Memory

It takes unstructured input, reasons with the model under your instructions, calls tools mid-flight to retrieve or act, and returns a message or structured output.

13 · Lifecycle

Foundry is an assembly line for agents

Microsoft Foundry 1Models 2Customizability 3Knowledge and tools 4Orchestration 5Observability 6Trust

Six stages, secure and testable end to end: modelscustomizationknowledge and toolsorchestrationobservabilitytrust.

14 · Runtime

The Agent Service runs the loop, not your code

your code · client.responses.create(model, input, agent_reference)
Foundry Agent Service · on the Responses API
1Load the agent version: system prompt, tools, model binding
2Persist the conversation: resume via previous_response_id
3Call the model through the gateway, RBAC-scoped, no keys
4Dispatch tools: Code Interpreter, File Search, MCP server-side
5Stream and trace: output deltas plus OpenTelemetry spans
6Content safety on input and output
grounded result + citations · FunctionTool calls handed back to you for human-in-the-loop

For a prompt agent there is nothing to deploy: you submit a request and the service owns thread state, tool dispatch, retries and content safety server-side.

15 · Knowledge & tools

What an agent can know and do

Knowledge (grounding)

Azure AI Search
File search
Web / Bing grounding
Foundry IQ knowledge base
Work IQ (Microsoft 365)
Microsoft Fabric (Fabric IQ)
SharePoint

Tools & actions

Code Interpreter
Function calling
Azure Functions
Logic Apps connectors
Image generation

Custom & protocols

Model Context Protocol
OpenAPI tools
Agent-to-Agent (A2A)
Toolbox (curated set of tools)
Tool catalog
Skills (reusable instructions)
Agent memory

Grounding sources and actions plug in by configuration, not custom plumbing. MCP and OpenAPI let an agent call almost any tool server or API, and the IQ family (Foundry, Work, Fabric) grounds it in your data.

16 · Hosted

When you want to bring your own runtime

Prompt agent declarative, runs on the Agent Service
  • You define model + instructions + tools
  • Nothing to deploy, no container
  • The service owns the loop
Hosted agent bring your own code and runtime
  • Deploy from source (Foundry builds it, no Docker), or ship your own container
  • Foundry provisions compute and a dedicated endpoint
  • Per-session VM sandbox, scale to zero, OpenTelemetry
  • Gets its own Microsoft Entra agent identity
Bring your framework Microsoft Agent Framework LangGraph OpenAI Agents SDK Anthropic Agent SDK GitHub Copilot SDK your own code

Most agents need no container. For your own runtime, hosted agents run any framework with a managed identity and a sandbox: deploy straight from source (no Docker), or bring your own container.

17 · Hosted SDK

A whole agent harness inside one container

Foundry hosted runtime · your container
main.py + InvocationAgentServerHost · outer loop
POST /invocationsone user turnstream events out as SSE
spawned Copilot CLI subprocess · inner agentic loop
model calltool?run shell / pythonfeed result backidle
inference ◆ your Foundry gpt-5.x via managed identity · /openai/v1/responses · no secrets in the container

main.py is a thin invocations shell; the real reason-act-observe loop runs in a spawned CLI subprocess. Inference is bring-your-own-key to your own Foundry model, so no secrets ship in the image.

Live demo

Build and run an agent

Build a prompt agent in the playground.

portal Build → playground
18 · Foundry IQ

Grounded knowledge, as a managed layer

Knowledge sources
Azure Blob · OneLake
SharePoint
Existing search indexes
Web (Grounding with Bing)
◆ Knowledge base
One endpoint, shareable across many agents. Permission-aware: honours ACLs and Purview sensitivity labels under the caller's identity.
engine: Azure AI Search agentic retrieval
Agents attach via MCP
prompt agent
multi-agent system
Foundry IQ · enterprise knowledge Fabric IQ · analytics Work IQ · M365

Foundry IQ is a managed knowledge layer over Azure AI Search. You build a knowledge base from your sources once and any agent grounds on it, with citations and permissions enforced.

19 · Retrieval

The model lives inside the search, not just after it

Build the index
documents3k arxiv-nlp abstracts embedtext-embedding-3-large · 3072d indexHNSW vector + semantic + ACL trim
◆ Knowledge source registers the arxiv-nlp index
kb-fast
minimal effort · no LLM
direct index-driven retrieval
fastest, no model quota
kb (standard)
low effort · one LLM pass
gpt-4.1-mini (APIM) plans sub-queries
better relevance, multi-turn
Both return raw chunks (EXTRACTIVE_DATA) for the agent's own LLM; each KB auto-exposes an MCP endpoint. Effort: minimal → low → medium → high.

Two knowledge bases over one index: kb-fast (minimal effort, no LLM) for speed, kb (standard) (low effort, gpt-4.1-mini) for relevance. The effort knob puts the model inside the search, trading depth for latency and cost.

Live demo

Build a knowledge base, then retrieve

Index a corpus → build kb-fast and kb-standard → retrieve and compare the two reasoning efforts.

20 · Control plane

One agent is easy. A fleet is the hard part.

Data plane
Agents in action
Chatting, calling tools, retrieving data, generating responses. The work running.
Control plane
See, govern, act
One surface for identity, policies, security, observability and cost, across every project and cloud.
Risk: Prompt injection
Untrusted content in a tool result hijacks the agent's instructions.
Risk: Task drift
The agent quietly does something other than what it was asked.
Risk: Data leakage
Access plus a confused instruction plus an outbound channel equals exfiltration.

Agents add failure modes apps never had, and they compound as you add tools and data. Their underlying intelligence is a probabilistic, non-deterministic LLM, not deterministic code, so the same input can behave differently each time.

21 · The pillars

What the control plane brings together

CONTROLSGuardrailsPrompt Shields, content filters and blocklists, validated by red teaming.
OBSERVABILITYSee insideTracing, continuous evaluation on live traffic, per-agent cost.
SECURITYIdentity & dataEntra Agent ID, Microsoft Defender, Microsoft Purview.
FLEET OPSAt scaleOne inventory and to-do list across projects, frameworks and clouds.

The control plane brings four essentials into one surface: runtime guardrails, observability, agent security and fleet operations, across every project, framework and cloud.

22 · Controls

Guardrails on inputs, outputs and tool traffic

User input
INPUT Prompt Shields (jailbreak + indirect) · content filters · custom blocklists
Agent + model
TOOL CALL / RESPONSE task adherence on the call · indirect injection on the response
OUTPUT content filters · protected material · groundedness · sensitive data / PII · custom blocklists
Response

A guardrail is a set of controls at four points: input, tool call, tool response, and output. The tool-call and tool-response checks are the agent-specific part, catching indirect prompt injection before the agent acts.

Live demo

Trip the guardrails

Prompt Shields block a jailbreak.

23 · Observability

Trace it, evaluate it, cost it

Tracing OpenTelemetry traces, prompt to model to tool, into Application Insights.
  • Walk back any run, step by step
  • Auto-traced for Agent Framework, LangChain, LangGraph
Continuous evaluation Score live production traffic, not just a pre-ship test suite.
  • Set a threshold, alert when quality drops
  • Groundedness, task adherence, tool-call accuracy
Cost Per-agent token spend across the fleet, near real time.
  • Agents burn tokens fast: watch them
  • Sort the fleet by cost or error rate

You cannot human-review every step, so you evaluate live traffic against a threshold and trace what crosses it, while watching cost per agent.

Live demo

Show an eval run

24 · Security

Every agent gets a real identity

Entra Agent ID Publish an agent and it gets a Microsoft Entra identity, automatically.
  • Access control like any principal
  • Ownership: know who to call at 2am
  • Lineage across its lifecycle
Microsoft Defender AI security posture and threat detection extended to agents.
  • Attack-path analysis, gap recommendations
  • A jailbreak surfaces as a security alert
Microsoft Purview Agent interactions available for audit and compliance.
  • Org-wide content-safety policies
  • Sensitivity labels honoured
Controls = each developer configures
Opt in or out on your own agent, e.g. Prompt Shields, content filters, a PII blocklist.
Policies = the organisation mandates
"Every agent must have indirect-injection protection on", scanned continuously.

Publishing an agent issues an Entra identity (app + object id) for access, ownership and lineage.
Policies turn a developer's optional control into an org-wide mandate.

Live demo

Mandate a control across the fleet

Create a policy in Operate / Compliance → mandate a control across a subscription → scanned continuously across every agent.

25 · Red teaming

Attack your own agents, on purpose

Find the holes before someone else does
  • Probe: automated attacks, every category and strategy
  • Score: an Attack Success Rate baseline, by technique
  • Harden: add or tighten guardrails where attacks land
  • Re-scan: the same suite, and watch the ASR fall
  • Gate: re-run on every release, in CI/CD
Attack strategies (PyRIT)
Base64ROT13Unicode confusablesCrescendomulti-language
Risk categories
HateViolenceSexualSelf-harm

The AI Red Teaming Agent (built on PyRIT, preview) runs automated adversarial scans and scores an Attack Success Rate, so you can prove a guardrail policy actually moved the needle.

Live demo

Run the advanced attack suite

Mutate attacks with encodings (Base64, ROT13, Unicode) and other languages → add your own attack prompts → score the ASR per category and strategy.

26 · Fleet ops

One to-do list for every agent, any cloud

Fleet to-do

Jailbreak attempt blocked · contoso-bank-agent
Eval below threshold · aria-rm-briefing-agent
Policy gap: indirect injection off · 2 agents
Cost spike · contoso-pmo-agent, +180% today
Error rate 100% · contoso-bank-agent
2 agents in Unknown state · github-copilot

Assets · Agents

NameSourceProjectStatusErrorsTokensRuns
github-copilotFoundryproject-copilot-sdkUnknown0.00%-18
aria-rm-briefing-agentFoundryproject-admin-c2676fRunning1.92%153.8K52
contoso-bank-agentFoundryproject-admin-c2676fRunning100.00%-40
contoso-pmo-agentFoundryproject-admin-c2676fRunning0.00%15.3K4
SpaceExpertFoundryproject-alpha-meshRunning0.00%8603
arxiv-nlp-agentFoundryiq-projectRunning0.00%27K5
team-beta-agentFoundryproject-beta-c2676fRunning0.00%561
team-delta-agentFoundryproject-delta-c2676fRunning0.00%561
claims-triageAI GatewayLangGraphRunning0.00%4.1K12
legacy-helpdeskAI GatewayAWSBlocked--0

Operate is a to-do list for the fleet: blocked jailbreaks, evals below threshold, policy gaps. External agents (e.g., AWS) join the same view by routing through the AI Gateway.

Live demo

The admin fleet overview

Operate: the fleet overview and assets inventory.

portal · Operate
27 · Provisioning

The admin essentials

Four Foundry roles

Foundry User · build & call: models, agents, evals, data plane
Project Manager · manage projects, assign User; full data plane
Account Owner · deploy, connections, quota; no data plane
Owner · unrestricted: full control + data plane
tip: assign to groups, not people

Two-layer RBAC

Layer 1 Foundry plane · portal, projects, build
Layer 2 Cognitive Services plane · content filters, quota, keys
gotcha: the second layer is commonly missed

Regions & quota

Pick by model availability + residency
Quota is TPM per region + subscription

Cost & SDK

Platform is free; pay per deployment
No hard spend cutoffs · budget alerts
SDK: azure-ai-projects + OpenAI client

For platform owners: four roles (assigned to groups), a two-layer RBAC model, TPM quota by region and subscription, and a free platform you meter at the deployment level.

28 · Why Foundry

One platform, two jobs done well

Build

  • Deploy any model behind one governed gateway
  • Agents on the Responses API and one SDK
  • Knowledge and tools by config, MCP-native
  • Hosted runtimes in the framework you choose
  • Grounded, cited answers with Foundry IQ

Operate

  • A Microsoft Entra identity for every agent
  • Guardrails on inputs, outputs and tool traffic
  • Tracing and continuous evaluation on live traffic
  • Red teaming that measures attack success
  • One fleet view across projects and clouds

Build with rich primitives and govern them from the same place.