Agentic Engineering in the CLI

Lessons from Claude Claude Code — applied to — Gemini Gemini CLI

GDG Kolachi · Apr 25, 2026
Syed Umaid Ahmed

Syed Umaid Ahmed

Software Architect at disrupt.com

Education
FAST NUCES logo
PhD in Computer Vision and Artificial Intelligence
FAST NUCES — In progress
NED University logo
Bachelor’s in Electrical, Master’s in Mechatronics
NED University — 2018, 2021
Animal Passport Pakistan logo
Contributions
Founder of Animal Passport Pakistan
Featured in Shark Tank Pakistan 🇵🇰
Shayan Rais

Shayan Rais

Software Architect at disrupt.com

Education
FAST NUCES logo
Master’s in Computer Science
FAST NUCES — 2019
NED University logo
Bachelor’s in Computer Information Systems
NED University — 2014
Claude mascot
Contributions
Creator of claude-code-best-practice
The most starred 🇵🇰 Pakistani AI repo GitHub with almost 50,000 stars
claude-code-best-practice GitHub stars growth

🚨 Problem Statement

Fetch weather of Karachi

  1. Single source of truth
  2. Repetitively

By solving this problem statement, we’ll learn different concepts of agentic engineering along the way.

Jargon you'll hear

I'll unpack each of these as we go — for now, just let them wash over you.

agentic engineering progressive disclosure orchestration dumb zone agentic workflows harness compaction context window ralph wiggum loop MCP hooks context rot prompt engineering AI slop inference hallucination context bloat one-shot prompting token burn vibe coding
scroll for glossary

progressive disclosure — feeding the AI only what it needs right now, not everything upfront

orchestration — coordinating several AI agents like a conductor leads a band

dumb zone — the stretch where AI has too much context and starts thinking worse, not better

agentic workflows — AI that plans, acts, checks its work, and adapts — multi-step on its own

harness — the scaffolding around the model — files, terminal, tools — that turns a chatbot into a worker

compaction — auto-summarizing old chat so the AI keeps going without hitting its memory ceiling

context window — the AI’s working memory — how much it can “see” at once before older details fall off

ralph wiggum loop — when the AI repeats the same broken step in circles, like a confused kid who can’t stop

MCP — a universal adapter letting AI talk to your tools (GitHub, Slack, databases)

hooks — auto-triggers that run your rules before or after the AI does anything

context rot — quality decay as the conversation drags on and earlier details blur

prompt engineering — the craft of phrasing requests so the AI understands exactly what you mean

AI slop — low-effort, generic AI output that looks polished but says nothing

inference — the moment the model actually runs to produce an answer. Training is when the model learned (once, long ago). Inference is the model answering you, right now

hallucination — when AI confidently makes up facts that sound true but aren’t

context bloat — overstuffing the AI’s memory so it slows down and loses focus

one-shot prompting — giving the AI one example and asking it to follow the same pattern

token burn — wasting expensive AI “words” on unnecessary back-and-forth or bloated prompts

vibe coding — describing what you want in plain English and hoping the AI nails it

agentic engineering — building guardrails so AI acts like a reliable teammate, not a gamble

Difference between GPT and ChatGPT?

discuss

Boris Cherny slides showing the spectrum of Claude Code usage styles
Boris Cherny (creator of Claude Code) — different teams use Claude Code completely differently.
There is no single “correct” way. But there are patterns worth understanding.

What is Claude Code?

Model (Brain 🧠 — e.g. Opus, GPT) + Harness (Body 💪 — e.g. tools, MCP, memory)

🦜

“Stochastic Parrots”

Stochastic means random/probabilistic — models don’t know the answer, they sample from a probability distribution.

Prompt:  “The sky is ___”

blue
62%
clear
14%
dark
8%
falling
0.3%

Each run the model samples — temperature controls how widely it samples.

Bender, Gebru, McMillan-Major, Mitchell — On the Dangers of Stochastic Parrots (2021)

stochastic sycophantic hallucinating stateless biased confident pattern-matching

🧠 Models — e.g. Opus, GPT, Gemini

✅ Can answer

“What is the capital of Japan?”

“When did Pakistan gain independence?”

“Who wrote Romeo and Juliet?”

❌ Can’t answer

“Who won yesterday’s match?”

“What’s today’s USD → PKR rate?”

“What did Anthropic release yesterday?”

Every model — no matter how new — has a knowledge cut-off. Events after that date simply do not exist inside the model.

Opus 4.7 — Anthropic

Knowledge cut-off: January 2026

Released 2026-04-17

GPT-5.5 — OpenAI

Knowledge cut-off: December 1, 2025

Released 2026-04-23 — brand-new, but still has a cut-off.

Gemini 3.1 Pro — Google

Knowledge cut-off: January 2025

Released 2026-02-19

🧠 Limitations

The raw model has no real-time access — no internet, no files, no clock.

Anthropic Workbench screenshot showing the model refusing a real-time weather query
Control loop decides the next call Reins Tool allowlist constrains what it can do Bit Context management narrows what it sees Blinders Evaluator decides when done Driver State persistence carries work forward Traces

A horse harness. A model harness.

The model is the horse. Raw power, no direction. The harness is everything else.

The origin is Old French harneis — gear, equipment, armor.

💪 Harness — the body around the brain

🧑‍💼

Agents — the specialists

A dedicated Claude worker — own context, tools, focus.

✅ fresh working memory per run

🎯

Skills — the know-how

What the specialist (or Claude) can actually do.

✅ progressive disclosure — loaded on demand

📘

Workflows — the instruction manual

Repeatable step-by-step recipes — like an AC install guide.

✅ reproducible recipes

📝

CLAUDE.md — your memory

Knowledge you provide to the model.

⚠️ 200-line problem

💭

Context — the working memory

What Claude holds in his head now — fresh every new chat session.

20% of 1M tokens

⚠️ dumb-zone problem

💪 Harness — the body around the brain

Tools — the hands

Built-in: Read, Edit, Bash, WebSearch.

🔌

MCP — the adapters

USB-C for AI — plug in external tools (databases, browsers, APIs).

e.g. 👁️ Claude in Chrome

🛡️

Permissions — the guardrails

Allow / ask / deny for tool use.

🪝

Hooks — the reflexes

Deterministic scripts that fire on events.

📝

System Prompt — Claude’s memory

Knowledge Anthropic bakes in.

e.g. identity · tone · safety

✅ always on

🎉 Yayyyyy! Problem solved with harness

The harness reaches out via WebSearch and fetches a real answer from live sources.

Claude Code using Web Search across AccuWeather, Weather25, and TimeAndDate to answer "what is the weather in Karachi?"
?

Really?

💪 Non-determinism — Doesn’t always use its tools

Similar prompt — but this time the model decided not to use the tool.

Claude Code refusing to check live weather and asking permission to use WebFetch

💪 Non-determinism — Tools can fail

The model first tried one source — it failed (403) — so it fell back to another.

Claude Code showing a 403 error from timeanddate.com and falling back to wttr.in

🚨 Problem Statement

  1. ❌ Single source of truth

    The model does not always follow the same path.

  2. ❌ Repetitively

    Model sometimes failed/forgot to use the tool.

Vibe Coding

Andrej Karpathy's Feb 3 2025 tweet coining 'vibe coding' — 'fully give in to the vibes, embrace exponentials, and forget that the code even exists'

Andrej Karpathy — OpenAI founding team · former Director of AI at Tesla · founder of Eureka Labs.

Vibe Coding vs Agentic Engineering

# Plain TodoApp todoapp/ backend/ main.py routes/ todos.py users.py models/ todo.py user.py tests/ test_todos.py frontend/ components/ TodoList.tsx Sidebar.tsx pages/ index.tsx todos.tsx lib/ api.ts
# With Claude Code Best Practices todoapp/ .claude/ # Claude Code config agents/ # Custom subagents skills/ # Domain knowledge commands/ # Slash commands hooks/ # Lifecycle scripts rules/ # Modular instructions settings.json # Team settings settings.local.json # Personal settings backend/ main.py routes/ todos.py users.py models/ todo.py user.py tests/ test_todos.py CLAUDE.md # Backend instructions frontend/ components/ TodoList.tsx Sidebar.tsx pages/ index.tsx todos.tsx lib/ api.ts CLAUDE.md # Frontend instructions .mcp.json # Managed MCP servers CLAUDE.md # Project instructions

👤 Agents

/agents

Examples: weather reporter, front-end engineer, QA engineer.

✅ Fresh working memory per run 🎯 Scoped tools, model, permission ⚡ Parallelizable
root/
├── .claude/
│   └── agents/
│       └── weather-agent.md
└── README.md

Create your first agent — /agents

The Command

Type /agents.

Opens an interactive menu — pick "Create new agent" and the CLI drafts the agent file for you.

Creates .claude/agents/<name>.md — a plain markdown file anyone can edit.

Demo

🎉 Yayyyyy! Problem solved with agents

Agent created
?

Not so fast...

🚨 Problem Statement

  1. ✅ Single source of truth

    The agent will always call the Open-Meteo API — no more source drift.

  2. ❌ Repetitively

    It’s not guaranteed that Claude will always call this agent.

Agent config fields with frontmatter

Field Description Enforced by
name Recommended Unique identifier — e.g. weather-agent harness
description Required When to invoke. Use "PROACTIVELY" for auto-invocation prompt
model haiku, sonnet, opus, or inherit (default). weather-agent uses sonnet harness
tools Allowlist of tools. weather-agent allows WebFetch, Read, Write, etc. harness
skills Skills preloaded into agent context at startup — weather-agent preloads weather-fetcher harness
maxTurns Maximum agentic turns before the agent stops. weather-agent uses 5 harness
memory Persistent memory: user, project, or local. weather-agent uses project harness
color Visual color in task list. weather-agent uses green harness

The skills: field is what makes the agent special. It preloads any-skill directly into the agent’s brain at startup — before the agent has received a single instruction.

claude-code-best-practice Tips & Tricks

Claude Code tips and tricks — part 1

claude-code-best-practice Tips & Tricks

Claude Code tips and tricks — part 2

📝 CLAUDE.md

CLAUDE.md

Knowledge you provide to the model — read every session.

✅ Loaded into every session 👥 Team-shareable via git ⚠️ 200-line problem
root/
├── CLAUDE.md
└── README.md

Create your first CLAUDE.md — /init

The Command

Type /init in Claude Code.

Claude scans your codebase and drafts a starter CLAUDE.md for you — project conventions, key patterns, common commands.

Creates CLAUDE.md in your repo root — a plain markdown file you edit and commit.

📝 CLAUDE.md — the real thing

A real CLAUDE.md file showing project rules and agent invocation instructions

claude-code-best-practice Tips & Tricks

Claude Code tips and tricks for CLAUDE.md

⚠️ CLAUDE.md problem — keep it under 200 lines

Illustration of the CLAUDE.md 200-line adherence problem

🎯 Skills

.claude/skills/

Examples: weather fetching, sorting CSV rows, generating SVG cards.

✅ Progressive disclosure — loaded on demand ♻️ Reusable across agents
root/
├── .claude/
│   └── skills/
│       └── weather-fetcher/
│           └── SKILL.md
└── README.md

Create your first skill

Write it by prompting

Just describe what you want Claude to do. It drafts the skill file for you.

"Create a skill that fetches weather from Open-Meteo for a given city."

Use Anthropic's skill-creator

Anthropic ships an official skill-creator skill. Invoke it and it walks you through generating a properly-structured skill.

Recommended — always produces the correct SKILL.md format.

⚠️ Watch out (method 1)

The prompting method sometimes creates the wrong structure. Instead of generating a folder with SKILL.md inside (e.g. weather-fetcher/SKILL.md), it creates a plain weather-fetcher.md file. The wrong form isn’t recognized as a skill by Claude Code.

🎯 Skills — a real one

A real skill file on disk — SKILL.md inside a named folder inside .claude/skills/

Skill config fields with frontmatter

Most fields control how and when the skill loads — enforced by the harness. Only description lives in prompt-land.

Field Description Enforced by
name Recommended Display name and /slash-command (defaults to directory name) harness
description Required When to invoke — used for auto-discovery. Write it clearly. prompt
argument-hint Autocomplete hint shown in the / menu (e.g. [city-name]) harness
disable-model-invocation Set true to prevent Claude from invoking this skill automatically harness
user-invocable Set false to hide from / menu — background knowledge only harness
allowed-tools Tools allowed without permission prompts when this skill is active harness
model Model to use when skill is active: haiku, sonnet, opus harness
context Set to fork to run skill in an isolated subagent context harness

Small description, full body loaded on demand — this is progressive disclosure. Claude’s main context stays lean until the skill is actually needed.

💭 Context

/context

The model's working memory — what it can see in this moment.

✅ Fresh every chat ⚠️ Dumb-zone problem ⚠️ /compact
20% of 1M tokens

🧠 Claude's Brain

Think of it like this

Imagine Claude has a brain that holds everything it's aware of right now — your question, every file it's opened, every tool result, every word it's said back to you. If a thought isn't in the brain, Claude can't use it. Simple as that.

Context window diagram showing how the 1M token limit is divided across system prompt, tools, files, and conversation
💬
Your messages Every question, clarification, and follow-up
📄
Every file Claude opens Read a 500-page PDF? All of it is in the brain now.
🔧
Every tool result Web searches, command output, screenshots — all held in memory

Two Things to Know

1. The brain is finite. It can hold about 1 million tokens — roughly 750,000 words. Big, but not infinite. 2. The brain empties at the end of every chat. When you start a new conversation, Claude remembers nothing from the last one unless you tell it again.

What Loads at Session Start

The moment you open Claude Code, certain things land in Claude's brain before you've typed a word. The rest waits in the wings — only loaded when you actually need it. This is called progressive disclosure.

Diagram showing what loads into Claude's context window at session start
📋
CLAUDE.md — full content Standing instructions, every line. Always pinned in the brain at startup.
🎓
Skills — descriptions only Claude knows which skills exist and what each does — the full content loads when the skill is actually invoked.
👤
Agents — descriptions only The roster of specialists. The full system prompt loads when the agent is dispatched.

The One-Liner

Only descriptions of skills and agents are loaded at startup — the rest is fetched on-demand. That's progressive disclosure. It keeps the brain light.

📄 Lost in the Middle

by Nelson F. Liu · Stanford University · 2023

Lost in the Middle paper illustration showing U-shaped attention curve across context positions
  1. Primacy & recency work. Information at the start of the context (what Claude sees first) and at the end (what Claude saw most recently) gets strong attention. These are the safe zones.
  2. The middle is the dumb zone. Content buried in the middle of a long context effectively disappears. The model has the tokens, but doesn't weight them strongly during generation. It's in the working memory — but not really read.
  3. Practical implication. Keep critical instructions either at the very top (in CLAUDE.md) or near the user's most recent message. A bigger context window doesn't help if your payload lands in the middle.

This is the "dumb-zone problem" the deck has been warning about — now you know where it came from.

📘 Workflow

.claude/commands/

Repeatable step-by-step recipes — the instruction manual that makes Claude run the same playbook every time.

✅ Reproducible recipes 🔁 Same steps, every run 👥 Team-shareable via git

📋 Orchestration Workflow

Command → Agent → Skill architecture flow

📹 Workflow in action

Claude running the weather orchestration workflow end-to-end

⚖️ Comparison

Claude Code vs Gemini CLI — same pattern, different CLIs.

Claude Claude Code
Gemini Gemini CLI

📁 File structure

Claude Claude Code
Project memory
CLAUDE.md
Agents
.claude/agents/*.md
Skills
.claude/skills/*/SKILL.md
Slash commands
.claude/commands/*.md
Gemini Gemini CLI
Project memory
GEMINI.md
Agents
.gemini/agents/*.md
Skills / Tools
.gemini/commands/*.toml
Slash commands
.gemini/commands/*.toml

Same idea, different folder

Both CLIs use plain text files in a hidden project folder. Skills/commands in Gemini CLI share the same .toml format — check vendor docs for the latest spec.

🧠 Model & context window

Claude Claude Code
Latest model
Claude Opus 4.7
Knowledge cut-off: January 2026 — Released 2026-04-17
Context window
1M tokens
~750,000 words — roughly 10 full novels in one session
Output tokens
8K standard — 64K extended thinking
Gemini Gemini CLI
Latest model
Gemini 3.1 Pro
Knowledge cut-off: January 2025 — Released 2026-02-19
Context window
1M+ tokens
Gemini 2.5 / 3 Pro extended — check vendor docs for exact figures
Output tokens
~8K standard
Extended output depends on model version — check vendor docs

Context window and output token figures reflect published specs as of 2026-04-24. Verify against vendor docs for the latest.

📋 Gemini Orchestration Workflow

Gemini CLI Command → Agent → Skill architecture flow

🙏 Thank you!

Thank you

Questions? Feedback?

GitHub github.com/shanraisshan
1 / --
or Space to navigate