Unified breadth: build with agents, tools, memory and knowledge, then operate them with identity, observability and fleet governance, all behind one resource.
02 · Portal
The portal is organised around jobs to be done
◆ Microsoft Foundry/ proj-foundry-coreSearch with AI (Ctrl + K)New Foundry HomeDiscoverBuildOperateDocs☼ ✉ ☉
Welcome back
Start building ›
Project endpoint proj-foundry-core.services.ai.azure.comRegion Sweden CentralAPI key auth disabled
HomeLand, resume recent work, quick start.
DiscoverNetflix-style catalog of models and tools.
BuildCreate agents, apps and workflows.
OperateAdmin and fleet view across projects.
DocsDocumentation, in context, never leave.
Flip on New Foundry at ai.azure.com and you get five tabs by job. Everything sits inside a Foundry resource (a subscription + resource group: the billing and governance boundary).
03 · Discover
A model catalog you can compare, then deploy
◆ Microsoft Foundry/ proj-foundry-coreSearch with AI (Ctrl + K)New Foundry HomeDiscoverBuildOperateDocs☼ ✉ ☉
Discover what's possible1,900+ models · explore by provider, collection, leaderboard
Tools view: Foundry Tools catalog: remote & local MCP servers, OpenAPI and A2A. Configure once, add to any agent or workflow.
Optimise the choice across quality, safety, throughput and cost, compare models side by side, then quick-deploy (global standard) straight from the card.
Live demo
Compare and deploy from the catalog
Discover → leaderboard → compare → quick-deploy, then the Foundry Tools / MCP catalog.
in the portal · ai.azure.com → Discover
04 · Build
Where developers create and manage every asset
◆ Microsoft Foundry/ project-admin-c2676fSearch with AI (Ctrl + K)New Foundry HomeDiscoverBuildOperateDocs☼ ✉ ☉
1 : 1 · dedicated one account, one team: a hard infrastructure boundary
◆ aif-spoke-alpha
project-alpha
For strict compliance or cost isolation, and to separate dev / test / prod.
1 : N · shared one account, many isolated project workspaces
◆ aif-spoke-multi
project-betaproject-deltaproject-gammaiqobscu
For teams on the same cost centre: cost-efficient, shared RBAC, per-project isolation.
The Foundry resource is the billing, networking and quota boundary. The project is the isolated workspace (its own agents, data and connections).
07 · Architecture
Centralise inference behind one AI gateway
Models live only in the 'hub' Foundry accounts*
◆ aif-core
East US 2 · general
gpt-4.1-miniembeddings
◆ aif-research
Norway East · reasoning
o3-deep-research
◆ aif-oss
West US 3 · open-weights
Phi-4
↑ managed identity · routed by URL (most specific wins)
⇉ APIM gateway · apim-foundry
swaps client key → managed-identity token · per-team rate limits & quota · routes by URL to hubs
↑ each project: its own connection core-{team} + gateway key
Spoke 1 : 1 no models · deny policy
◆ aif-spoke-alpha
project-alpha
reaches models via the core-alpha connection
Spoke 1 : N no models · deny policy
◆ aif-spoke-multi
betadeltagammaiqobscu
each project: own key, own quota
* 'Hub' here means the central model-hosting Foundry accounts in this topology, not the legacy Foundry v1 'hub' resource type.
One Azure API Management gateway fronts every model. Spokes hold zero deployments: they call the hub through a per-team key, so cost, content filters and observability stay unified.
Deploy a model into a spoke and Azure returns RequestDisallowedByPolicy. Spokes can still use models through the gateway, just never deploy their own.
A one-rule Azure Policy denies Microsoft.CognitiveServices/accounts/deployments in spoke resource groups. The hub stays exempt, so the architecture cannot drift.
09 · Deploy
Deploying a model: pick a type, get an endpoint
Standard pay per token
Elastic, no commitment, billed per token
The default: global standard, one click
Best for spiky or early workloads
Provisioned / PTU reserved capacity
Predictable latency and throughput
One PTU pool can be shared across different provisioned models
Best for consistently high utilisation, latency-sensitive production
Global
max throughput, data may leave region
Data Zone
stays within a geography (EU / US)
Regional
pinned to one region for residency
Defaults vs custom: accept global standard + default quota, or customise SKU, quota (TPM) and guardrails. Partner models (Llama, Claude) need an Azure Marketplace subscription; models sold directly by Azure do not.
A deployment is a model + a deployment type. The type sets the cost model (per-token vs reserved) and the data-residency and throughput guarantees; quota is tracked per region and subscription, as PayGo (standard) or PTU (provisioned).
Live demo
Model Deployment Playground
Set the system prompt, attach web search → Monitor for tokens, cost and latency.
in the portal · ai.azure.com → Build → Deployments → Playground
10 · Endpoints
One resource, three endpoint surfaces
OpenAI SDK
*.openai.azure.com/openai/v1
Full OpenAI API surface: chat completions, embeddings, Responses, fine-tuning. No agents or evaluations.
key or token
Foundry SDK
*.services.ai.azure.com/api/projects/*
Foundry-native: agents, evaluations, connections, tracing. Responses API on its /openai route.
token only · Entra
Foundry Tools SDKs
*.cognitiveservices.azure.com
The other AI services: Speech, Vision, Language, Content Safety (formerly Azure AI Services).
key or token
One Foundry resource exposes all three · an Azure OpenAI resource has only /openai/v1
A Foundry resource is multi-surface: the OpenAI endpoint for raw inference, the project endpoint for Foundry-native agents and evals (token-only via Entra), and Cognitive Services for the other AI tools. Match the SDK to the endpoint.
11 · Inference
Calling models through the gateway
Direct client gateway key
AzureOpenAI(base_url=gateway, api_key=key)
Chat completionsEmbeddingsDeep research
Full Azure OpenAI surface; the team holds the gateway key.
Foundry project client keyless
responses.create(model="core-alpha/gpt-4.1-mini")
Responses APIMulti-turnStreaming
Keyless via Entra; routes through the project's core-{team} connection, which speaks the Responses API only.
↓ both paths reach the models through one APIM gateway · managed identity onward ↓
Pick by surface and auth: the direct client (gateway key) for the full OpenAI surface, or the Foundry project client (keyless via Entra) whose core-{team} connection speaks the Responses API. Both reach the same models through the one gateway.
Live demo
Model inference, end to end
The direct AzureOpenAI client and the Foundry AIProjectClient → chat, embeddings, deep research → the Responses API, multi-turn and streaming.
12 · Agents
An agent is a model, instructions and tools
Input
User messages
System events
Agent messages
→
Agent
LLM
Instructions
Tools
→
Output
Agent messages
Structured output
↓ tool call ↑ result
Tool calls
Retrieval
Actions
Memory
It takes unstructured input, reasons with the model under your instructions, calls tools mid-flight to retrieve or act, and returns a message or structured output.
13 · Lifecycle
Foundry is an assembly line for agents
Six stages, secure and testable end to end: models → customization → knowledge and tools → orchestration → observability → trust.
14 · Runtime
The Agent Service runs the loop, not your code
your code · client.responses.create(model, input, agent_reference)
↓
Foundry Agent Service · on the Responses API
1Load the agent version: system prompt, tools, model binding
2Persist the conversation: resume via previous_response_id
3Call the model through the gateway, RBAC-scoped, no keys
5Stream and trace: output deltas plus OpenTelemetry spans
6Content safety on input and output
↓
grounded result + citations · FunctionTool calls handed back to you for human-in-the-loop
For a prompt agent there is nothing to deploy: you submit a request and the service owns thread state, tool dispatch, retries and content safety server-side.
Grounding sources and actions plug in by configuration, not custom plumbing. MCP and OpenAPI let an agent call almost any tool server or API, and the IQ family (Foundry, Work, Fabric) grounds it in your data.
16 · Hosted
When you want to bring your own runtime
Prompt agent declarative, runs on the Agent Service
You define model + instructions + tools
Nothing to deploy, no container
The service owns the loop
Hosted agent bring your own code and runtime
Deploy from source (Foundry builds it, no Docker), or ship your own container
Foundry provisions compute and a dedicated endpoint
Per-session VM sandbox, scale to zero, OpenTelemetry
Gets its own Microsoft Entra agent identity
Bring your frameworkMicrosoft Agent FrameworkLangGraphOpenAI Agents SDKAnthropic Agent SDKGitHub Copilot SDKyour own code
Most agents need no container. For your own runtime, hosted agents run any framework with a managed identity and a sandbox: deploy straight from source (no Docker), or bring your own container.
17 · Hosted SDK
A whole agent harness inside one container
Foundry hosted runtime · your container
main.py + InvocationAgentServerHost · outer loop
POST /invocations→one user turn→stream events out as SSE
model call→tool?→run shell / python→feed result back↻idle
inference →◆ your Foundry gpt-5.x via managed identity · /openai/v1/responses · no secrets in the container
main.py is a thin invocations shell; the real reason-act-observe loop runs in a spawned CLI subprocess. Inference is bring-your-own-key to your own Foundry model, so no secrets ship in the image.
Live demo
Build and run an agent
Build a prompt agent in the playground.
portal Build → playground
18 · Foundry IQ
Grounded knowledge, as a managed layer
Knowledge sources
Azure Blob · OneLake
SharePoint
Existing search indexes
Web (Grounding with Bing)
→
◆ Knowledge base
One endpoint, shareable across many agents. Permission-aware: honours ACLs and Purview sensitivity labels under the caller's identity.
engine: Azure AI Search agentic retrieval
→
Agents attach via MCP
prompt agent
multi-agent system
Foundry IQ · enterprise knowledgeFabric IQ · analyticsWork IQ · M365
Foundry IQ is a managed knowledge layer over Azure AI Search. You build a knowledge base from your sources once and any agent grounds on it, with citations and permissions enforced.
19 · Retrieval
The model lives inside the search, not just after it
Both return raw chunks (EXTRACTIVE_DATA) for the agent's own LLM; each KB auto-exposes an MCP endpoint. Effort: minimal → low → medium → high.
Two knowledge bases over one index: kb-fast (minimal effort, no LLM) for speed, kb (standard) (low effort, gpt-4.1-mini) for relevance. The effort knob puts the model inside the search, trading depth for latency and cost.
Live demo
Build a knowledge base, then retrieve
Index a corpus → build kb-fast and kb-standard → retrieve and compare the two reasoning efforts.
20 · Control plane
One agent is easy. A fleet is the hard part.
Data plane
Agents in action
Chatting, calling tools, retrieving data, generating responses. The work running.
Control plane
See, govern, act
One surface for identity, policies, security, observability and cost, across every project and cloud.
Risk: Prompt injection
Untrusted content in a tool result hijacks the agent's instructions.
Risk: Task drift
The agent quietly does something other than what it was asked.
Risk: Data leakage
Access plus a confused instruction plus an outbound channel equals exfiltration.
Agents add failure modes apps never had, and they compound as you add tools and data. Their underlying intelligence is a probabilistic, non-deterministic LLM, not deterministic code, so the same input can behave differently each time.
21 · The pillars
What the control plane brings together
CONTROLSGuardrailsPrompt Shields, content filters and blocklists, validated by red teaming.
OBSERVABILITYSee insideTracing, continuous evaluation on live traffic, per-agent cost.
SECURITYIdentity & dataEntra Agent ID, Microsoft Defender, Microsoft Purview.
FLEET OPSAt scaleOne inventory and to-do list across projects, frameworks and clouds.
The control plane brings four essentials into one surface: runtime guardrails, observability, agent security and fleet operations, across every project, framework and cloud.
TOOL CALL / RESPONSE task adherence on the call · indirect injection on the response
↓
OUTPUT content filters · protected material · groundedness · sensitive data / PII · custom blocklists
↓
Response
A guardrail is a set of controls at four points: input, tool call, tool response, and output. The tool-call and tool-response checks are the agent-specific part, catching indirect prompt injection before the agent acts.
Live demo
Trip the guardrails
Prompt Shields block a jailbreak.
23 · Observability
Trace it, evaluate it, cost it
TracingOpenTelemetry traces, prompt to model to tool, into Application Insights.
Walk back any run, step by step
Auto-traced for Agent Framework, LangChain, LangGraph
Continuous evaluationScore live production traffic, not just a pre-ship test suite.
Set a threshold, alert when quality drops
Groundedness, task adherence, tool-call accuracy
CostPer-agent token spend across the fleet, near real time.
Agents burn tokens fast: watch them
Sort the fleet by cost or error rate
You cannot human-review every step, so you evaluate live traffic against a threshold and trace what crosses it, while watching cost per agent.
Live demo
Show an eval run
24 · Security
Every agent gets a real identity
Entra Agent IDPublish an agent and it gets a Microsoft Entra identity, automatically.
Access control like any principal
Ownership: know who to call at 2am
Lineage across its lifecycle
Microsoft DefenderAI security posture and threat detection extended to agents.
Attack-path analysis, gap recommendations
A jailbreak surfaces as a security alert
Microsoft PurviewAgent interactions available for audit and compliance.
Org-wide content-safety policies
Sensitivity labels honoured
Controls = each developer configures
Opt in or out on your own agent, e.g. Prompt Shields, content filters, a PII blocklist.
Policies = the organisation mandates
"Every agent must have indirect-injection protection on", scanned continuously.
Publishing an agent issues an Entra identity (app + object id) for access, ownership and lineage. Policies turn a developer's optional control into an org-wide mandate.
Live demo
Mandate a control across the fleet
Create a policy in Operate / Compliance → mandate a control across a subscription → scanned continuously across every agent.
25 · Red teaming
Attack your own agents, on purpose
Find the holes before someone else does
Probe: automated attacks, every category and strategy
Score: an Attack Success Rate baseline, by technique
Harden: add or tighten guardrails where attacks land
The AI Red Teaming Agent (built on PyRIT, preview) runs automated adversarial scans and scores an Attack Success Rate, so you can prove a guardrail policy actually moved the needle.
Live demo
Run the advanced attack suite
Mutate attacks with encodings (Base64, ROT13, Unicode) and other languages → add your own attack prompts → score the ASR per category and strategy.
26 · Fleet ops
One to-do list for every agent, any cloud
Fleet to-do
Jailbreak attempt blocked · contoso-bank-agent
Eval below threshold · aria-rm-briefing-agent
Policy gap: indirect injection off · 2 agents
Cost spike · contoso-pmo-agent, +180% today
Error rate 100% · contoso-bank-agent
2 agents in Unknown state · github-copilot
Assets · Agents
Name
Source
Project
Status
Errors
Tokens
Runs
github-copilot
Foundry
project-copilot-sdk
Unknown
0.00%
-
18
aria-rm-briefing-agent
Foundry
project-admin-c2676f
Running
1.92%
153.8K
52
contoso-bank-agent
Foundry
project-admin-c2676f
Running
100.00%
-
40
contoso-pmo-agent
Foundry
project-admin-c2676f
Running
0.00%
15.3K
4
SpaceExpert
Foundry
project-alpha-mesh
Running
0.00%
860
3
arxiv-nlp-agent
Foundry
iq-project
Running
0.00%
27K
5
team-beta-agent
Foundry
project-beta-c2676f
Running
0.00%
56
1
team-delta-agent
Foundry
project-delta-c2676f
Running
0.00%
56
1
claims-triage
AI Gateway
LangGraph
Running
0.00%
4.1K
12
legacy-helpdesk
AI Gateway
AWS
Blocked
-
-
0
Operate is a to-do list for the fleet: blocked jailbreaks, evals below threshold, policy gaps. External agents (e.g., AWS) join the same view by routing through the AI Gateway.
Live demo
The admin fleet overview
Operate: the fleet overview and assets inventory.
portal · Operate
27 · Provisioning
The admin essentials
Four Foundry roles
Foundry User · build & call: models, agents, evals, data plane
Project Manager · manage projects, assign User; full data plane
Account Owner · deploy, connections, quota; no data plane
For platform owners: four roles (assigned to groups), a two-layer RBAC model, TPM quota by region and subscription, and a free platform you meter at the deployment level.
28 · Why Foundry
One platform, two jobs done well
Build
Deploy any model behind one governed gateway
Agents on the Responses API and one SDK
Knowledge and tools by config, MCP-native
Hosted runtimes in the framework you choose
Grounded, cited answers with Foundry IQ
Operate
A Microsoft Entra identity for every agent
Guardrails on inputs, outputs and tool traffic
Tracing and continuous evaluation on live traffic
Red teaming that measures attack success
One fleet view across projects and clouds
Build with rich primitives and govern them from the same place.