02 · AI Agents · Foundations

Understanding AI Agents

AI agents combine language models with orchestrated tools and data sources to perceive real-world signals, plan multi-step solutions, and act autonomously on a user's behalf.

Watch the AI Agents Overview

What Are AI Agents?

Modern AI agents combine language models with orchestrated tools and data sources so they can observe real-world signals, plan multi-step solutions, and act autonomously or semi-autonomously on a user's behalf.

👁️

Perception

Agents ingest structured and unstructured data, including natural language, to understand user intent and environmental context.

🧠

Reasoning & Planning

They evaluate goals, decompose tasks, and determine next steps using chain-of-thought prompting and planning heuristics.

Action & Learning

Agents execute tasks by calling external tools, invoking APIs, or generating content — and refine behavior based on feedback loops.

How We Got Here

Modern agent design traces back to discrete prompt-engineering breakthroughs that made autonomous, multi-step reasoning possible.

January 28, 2022

Chain-of-Thought Prompting

Wei et al. showed that guiding models through intermediate reasoning steps dramatically boosts multi-hop problem solving — setting the stage for agents that explain and justify decisions. Read paper →

February 21, 2023

The Persona Pattern

Dr. Jules White introduced role-specific scaffolds that unlock tool selection and memory strategies — catalogued in the Prompt Pattern Catalogue and via White et al.

August 2023

AI Planning Pattern

White formalized the AI Planning pattern in ChatGPT Advanced Data Analysis, encouraging practitioners to externalize goal decomposition before autonomous agent operation.

March 14, 2024

LLM Agents for User Story Quality

Our team published findings showing how structured planning bridges prompt templates and autonomous agent stacks. Zhang et al. →

Architectural Elements

A production-grade AI agent is composed of four interlocking layers, each with distinct responsibilities.

Layer 1
Interface

Interface Layer

Handles conversations, user prompts, and other inputs while presenting results back to the user.

Layer 2
Orchestration

Orchestration Layer

Coordinates models, tools, and memory components so the agent can decide what to do next.

Layer 3
Foundation

Foundation Models & Tools

Provide language understanding, content generation, and the ability to interact with enterprise systems or third-party services.

Layer 4
Governance

Governance & Safety

Enforces policies, monitors for misuse, and ensures responses stay aligned with organizational standards.

Spectrum of AI Agent Types

Compare the common maturity levels to scope agent capabilities before adding orchestration, new tools, or autonomy.

Spectrum of AI agent types showing Retrieval, Task, and Autonomous panels
Retrieval

Retrieval Agents

Tap grounding data, reason over snippets, summarize, and answer scoped questions.

Use when:Grounded answers from approved documents or APIs without exposing full tool access.
Core moves:Chunk and embed data, route questions through retrieval-augmented prompts, explain citations inline.
Design notes:Prioritize latency, caching, and guardrails that stop hallucination outside the grounding corpus.
Ideal scope:Knowledge bases, policies, product catalogs, support wikis.
Task

Task Agents

Take direct actions, call tools, and automate multi-step jobs when asked.

Use when:Users expect execution of predictable workflows — create tickets, summarize calls, update sheets.
Core moves:Invoke deterministic tools, log every action, request clarifications when a parameter is missing.
Design notes:Pair prompts with checklists, expose safe defaults, and track completion metrics per task template.
Ideal scope:Workflow automation, CRM updates, support triage, creative drafts.
Autonomous

Autonomous Agents

Plan dynamically, orchestrate other agents, learn from feedback, and escalate when needed.

Use when:Work requires multi-day plans, branching decision trees, or hand-offs across specialized agents.
Core moves:Maintain scratchpads, spawn or orchestrate sub-agents, decide when to escalate to humans.
Design notes:Treat planning graphs as infrastructure, layer evals for alignment/safety, and add memory policies.
Ideal scope:Mission-critical copilots, ops copilots, research pods, complex routing.
Note: Agents vary in levels of complexity and capabilities depending on your need.

Execution Maturity Model

Show how execution maturity evolves from single-shot prompts to fully autonomous agents so teams can pick the right operating model.

Non-agentic workflow diagram
Non-Agentic

Non-Agentic Workflow

A one-off response or lightweight generation with no memory or tool usage. Think of it as querying a smart autocomplete. Example: Asking ChatGPT a single question and pasting the reply into your doc.

Agentic workflow diagram
Agentic

Agentic Workflow

A co-pilot that iterates with you, calls tools, or keeps short-lived context — IDE copilots, spreadsheet helpers, etc. Example: GitHub Copilot suggesting code while you type.

AI agent workflow diagram
Fully Autonomous

AI Agents

Hand over the goal and let the system decide the plan, tools, and escalation path. Example: An AI executive assistant that reschedules meetings without step-by-step prompts.

Building an AI Agent

A copy-ready recipe for shipping production-grade agents that blend prompts, tools, and evaluations.

Seven-step workflow for building an AI agent
1

System Prompt — Goals · Role · Instructions

  • Capture the business goal, the agent's voice, and constraint lists inside a locked system prompt.
  • Mirror Persona/Role patterns from this repo so the assistant understands scope, prohibited actions, and escalation paths.
  • Version prompts in source control and add inline TODOs for future guardrails.
2

LLM — Base Model, Parameters

  • Choose the base model (e.g., openai:gpt-5) and reason through cost, latency, and context window tradeoffs.
  • Fix deterministic parameters (temperature, top_p, max_tokens) for repeatable behavior, then expose overrides through orchestration.
  • Document fallback or cascade strategies if the primary provider rate-limits.
3

Tools — Local · API · MCP Server · Agent as a Tool

  • Start with native function calls (math, code execution) before wiring external APIs.
  • Register shared tools (search, RAG, CRMs) via Model Context Protocol servers so every agent reuses the same manifests.
  • Treat trustworthy agents as callable tools to compose nested workflows.
4

Memory — Episodic · Working · Vector DB · SQL · File Store

  • Episodic: store per-user transcripts for short-lived recall.
  • Working memory: chunk current task artifacts (draft docs, task lists) inside the conversation window.
  • Long-term: persist embeddings in a vector database, structured facts in SQL, and large binaries in an object/file store; set retention rules up front.
5

Orchestration — Routes · Triggers · Params · Queues · Agent-to-Agent

  • Define routing logic (state machines, LangGraph DAGs) that decides when each tool or sub-agent fires.
  • Expose typed parameters so triggers (webhooks, cron, human approvals) can launch runs safely.
  • Use queues or event buses for parallel tool calls and agent-to-agent collaboration.
6

UI — Interface

  • Ship the smallest possible UX (CLI, chat widget, workflow form) that captures structured inputs and displays streaming outputs.
  • Log every interaction with trace IDs so support teams can replay context.
  • Offer inline feedback toggles that write directly into your evaluation store.
7

AI Evals — Analyze · Measure · Improve

  • Add promptfoo or custom regression suites that assert structure (contain-json) plus domain-specific KPIs (accuracy, safety, tone).
  • Review eval deltas before deployments and attach dashboards to orchestrator metrics (latency, tool failures).
  • Feed eval learnings back into prompt and tool updates to close the loop.

Quick-Start Checks

Clarify success metrics (response time, accuracy, compliance) before coding.
Mock the agent run with a single workflow recording; confirm each message and tool output matches expectations.
Keep every dependency swappable (models, MCP servers, vector stores) so the stack remains portable.

Common Agentic Frameworks

A comparison of major no-code and code-first agent frameworks by model support, MCP compatibility, tools, and orchestration style.

No-code Framework LLM MCP Tools Agents Orchestration
OpenAI Agents API OpenAI Remote Predefined (web, file/code) Threads
Google Vertex AI * Remote Predefined (search, vision, etc.) Flow-based, native A2A
Anthropic Agents API * Remote Predefined (web, file/code) Tool-calling only
Microsoft AutoGen * None Predefined (REPL, code) Programmatic chaining
Autogen Studio * None Predefined Visual agent chaining
LangGraph Local, Remote Local, Remote LangChain Functions Graph (DAG)-based flow
LangChain Local, Remote Remote Functions (custom-defined) Manual (chains/agents)
CrewAI Local, Remote Remote Predefined, 40+ integrations Flow- & role-based
n8n Local, Remote Remote Predefined, 100+ integrations Workflows, sub-workflows

Sources & Further Reading

Academic papers, framework documentation, and guides referenced in this overview.