Claude Code Best Practice

Practical patterns for Claude Claude Code

github.com/shanraisshan/claude-code-best-practice
Shayan Rais

Shayan Rais

Software Architect at disrupt.com

Education
FAST NUCES logo
Master’s in Computer Science
FAST NUCES — 2019
NED University logo
Bachelor’s in Computer Information Systems
NED University — 2014
Claude mascot
Contributions
Creator of claude-code-best-practice
The most starred 🇵🇰 Pakistani AI repo GitHub with almost 50,000 stars
claude-code-best-practice GitHub stars growth

🚨 Problem Statement

Fetch weather of Karachi

  1. Single source of truth
  2. Repetitively

By solving this problem statement, we’ll learn different concepts of agentic engineering along the way.

agentic engineering progressive disclosure orchestration dumb zone agentic workflows harness compaction context window ralph wiggum loop MCP hooks context rot prompt engineering AI slop inference hallucination context bloat one-shot prompting token burn vibe coding
scroll for glossary

progressive disclosure — feeding the AI only what it needs right now, not everything upfront

orchestration — coordinating several AI agents like a conductor leads a band

dumb zone — the stretch where AI has too much context and starts thinking worse, not better

agentic workflows — AI that plans, acts, checks its work, and adapts — multi-step on its own

harness — the scaffolding around the model — files, terminal, tools — that turns a chatbot into a worker

compaction — auto-summarizing old chat so the AI keeps going without hitting its memory ceiling

context window — the AI’s working memory — how much it can “see” at once before older details fall off

ralph wiggum loop — when the AI repeats the same broken step in circles, like a confused kid who can’t stop

MCP — a universal adapter letting AI talk to your tools (GitHub, Slack, databases)

hooks — auto-triggers that run your rules before or after the AI does anything

context rot — quality decay as the conversation drags on and earlier details blur

prompt engineering — the craft of phrasing requests so the AI understands exactly what you mean

AI slop — low-effort, generic AI output that looks polished but says nothing

inference — the moment the model actually runs to produce an answer. Training is when the model learned (once, long ago). Inference is the model answering you, right now

hallucination — when AI confidently makes up facts that sound true but aren’t

context bloat — overstuffing the AI’s memory so it slows down and loses focus

one-shot prompting — giving the AI one example and asking it to follow the same pattern

token burn — wasting expensive AI “words” on unnecessary back-and-forth or bloated prompts

vibe coding — describing what you want in plain English and hoping the AI nails it

agentic engineering — building guardrails so AI acts like a reliable teammate, not a gamble

Difference between GPT and ChatGPT?

discuss

Boris Cherny slides showing the spectrum of Claude Code usage styles
Boris Cherny (creator of Claude Code) — different teams use Claude Code completely differently.
There is no single “correct” way. But there are patterns worth understanding.

Source: Boris Cherny on X — tweet 1 · tweet 2 · tweet 3

What is Claude Code?

Model (Brain 🧠 — e.g. Opus, GPT) + Harness (Body 💪 — e.g. tools, MCP, memory)

A horse. A model.

The model is the horse. Raw power, no direction.

How an LLM generates text (autoregressive)

Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced.
Each predicted token is appended to the input, then fed back into the LLM.
Screenshot of the OpenAI tokenizer showing the sentence about BPE split into 32 tokens across 105 characters, with tabs for GPT-5.x, GPT-4, and GPT-3 tokenizers.

How an LLM tokenizes input

Animated diagram combining tokenization and autoregressive generation: the BPE-tokenized prompt feeds into the LLM, which generates the answer token-by-token using the same shared vocabulary.
BPE chops words into subword tokens — same color = same word, gray = punctuation.

What the LLM actually sees: integer token IDs

Animated diagram showing the 32 integer token IDs the model receives: e.g. 28133 for 'Does', 17554 for ' Chat', 162016 for 'GPT', 97481 for ' Claude'. Generated tokens are also shown as IDs. Vocab size V ≈ 200,000.
Notice the comma is always ID 11 — the same punctuation mark maps to the same integer, everywhere, every time.

💬 Models are stateless

User Model
“My name is Shayan.” ➜ to model
“Okay, your name is Shayan.” ➜ to user
“What is my name?” ➜ to model
“I don’t know your name — I have no memory of what you just said.” ➜ to user

Every turn is a fresh API call.

Memory only exists if the harness replays the transcript.

🦜 “Stochastic Parrots”

Stochastic means random/probabilistic — models don’t know the answer, they sample from a probability distribution.

Prompt:  “The sky is ___”

blue
62%
clear
14%
dark
8%
falling
0.3%

Each run the model samples — temperature controls how widely it samples.

stochastic sycophantic hallucinating stateless biased confident pattern-matching

Source: Bender, Gebru, McMillan-Major, Mitchell — On the Dangers of Stochastic Parrots (2021)

🌡️ Even temperature = 0 isn’t deterministic.

You set it to zero. You expect the same answer every time. You’re wrong.

The data point
Without fix
80
unique completions
out of 1,000 calls
With fix
1
unique completion
out of 1,000 calls

Qwen3-235B at temperature = 0 — first divergence at token 103 (“Queens, New York” vs “New York City”)

Why it happens

Server load varies → batch size varies → kernel reductions reorder → numerics shift. Not GPU randomness — arithmetic order.

The fix

Batch-invariant kernels → consistent reduction order → identical numerics every run.

Determinism is engineered in — at every layer.

Source: Thinking Machines — Defeating Nondeterminism in LLM Inference (2025)

🧠 Models — e.g. Opus, GPT, Gemini

✅ Can answer

“What is the capital of Japan?”

“When did Pakistan gain independence?”

“Who wrote Romeo and Juliet?”

❌ Can’t answer

“Who won yesterday’s match?”

“What’s today’s USD → PKR rate?”

“What did Anthropic release yesterday?”

Every model — no matter how new — has a knowledge cut-off. Events after that date simply do not exist inside the model.

Opus 4.7 — Anthropic

Knowledge cut-off: January 2026

Released 2026-04-17

GPT-5.5 — OpenAI

Knowledge cut-off: December 1, 2025

Released 2026-04-23 — brand-new, but still has a cut-off.

Gemini 3.1 Pro — Google

Knowledge cut-off: January 2025

Released 2026-02-19

🧠 Limitations

The raw model has no real-time access — no internet, no files, no clock.

Anthropic Workbench screenshot showing the model refusing a real-time weather query
Control loop decides the next call Reins Tool allowlist constrains what it can do Bit Context management narrows what it sees Blinders Evaluator decides when done Driver State persistence carries work forward Traces

A horse harness. A model harness.

The model is the horse. Raw power, no direction. The harness is everything else.

The origin is Old French harneis — gear, equipment, armor.

⚡ Tool Calling — how the harness reaches the world

Diagram showing how Claude sends a tool-call request to the harness, the harness executes it against real-world systems, and the result flows back into Claude’s context

In the diagram above:  Turn × 1  ·  Inference × 2

Turn — one round from the user’s view: you ask, the assistant answers. The entire flow above — your request, the assistant’s tool calls, and the final reply — is one turn.

Inference — one call to the language model. The model wakes up, reads the input it was given, writes a reply, then forgets everything. Every arrow touching the “Language Model” column above is a separate inference. One turn can contain many inferences.

Source: Anthropic — Claude Code in Action: What is a coding assistant?

💪 Harness — the body around the brain

🧑‍💼

Agents — the specialists

A dedicated Claude worker — own context, tools, focus.

✅ fresh working memory per run

🎯

Skills — the know-how

What the specialist (or Claude) can actually do.

✅ progressive disclosure — loaded on demand

📘

Workflows — the instruction manual

Repeatable step-by-step recipes — like an AC install guide.

✅ reproducible recipes

📝

CLAUDE.md — your memory

Knowledge you provide to the model.

⚠️ 200-line problem

💭

Context — the working memory

What Claude holds in his head now — fresh every new chat session.

20% of 1M tokens

⚠️ dumb-zone problem

💪 Harness — the body around the brain

Tools — the hands

Built-in: Read, Edit, Bash, WebSearch.

🔌

MCP — the adapters

USB-C for AI — plug in external tools (databases, browsers, APIs).

e.g. 👁️ Claude in Chrome

🛡️

Permissions — the guardrails

Allow / ask / deny for tool use.

🪝

Hooks — the reflexes

Deterministic scripts that fire on events.

📝

System Prompt — Claude’s memory

Knowledge Anthropic bakes in.

e.g. identity · tone · safety

✅ always on

🎉 Yayyyyy! Problem solved with harness

The harness reaches out via WebSearch and fetches a real answer from live sources.

Claude Code using Web Search across AccuWeather, Weather25, and TimeAndDate to answer "what is the weather in Karachi?"
?

Really?

💪 Non-determinism — Doesn’t always use its tools

Similar prompt — but this time the model decided not to use the tool.

Claude Code refusing to check live weather and asking permission to use WebFetch

💪 Non-determinism — Tools can fail

The model first tried one source — it failed (403) — so it fell back to another.

Claude Code showing a 403 error from timeanddate.com and falling back to wttr.in

🚨 Problem Statement

  1. ❌ Single source of truth

    The model does not always follow the same path.

  2. ❌ Repetitively

    Model sometimes failed/forgot to use the tool.

Vibe Coding — Andrej Karpathy

Andrej Karpathy's Feb 3 2025 tweet coining 'vibe coding' — 'fully give in to the vibes, embrace exponentials, and forget that the code even exists'

Andrej Karpathy — OpenAI founding team · former Director of AI at Tesla · founder of Eureka Labs.

Source: Andrej Karpathy on X (Feb 3, 2025)

Vibe Coding — Robert C. “Uncle Bob” Martin

Robert C. Martin (Uncle Bob) in his video critiquing vibe coding

Uncle Bob warns that “vibe coding” — generating code from prompts without understanding what the LLM produces — is hazardous for novices. LLMs are mathematical functions that predict the next most likely token via matrix multiplications, trained on internet text and GitHub code. They are powerful tools — but, as he puts it, “novices using power tools lose fingers.”

Robert C. “Uncle Bob” Martin — author of Clean Code · Clean Architecture · co-author of the Agile Manifesto.

Source: Robert C. Martin on X

Vibe Coding vs Agentic Engineering

# Plain TodoApp todoapp/ backend/ main.py routes/ todos.py users.py models/ todo.py user.py tests/ test_todos.py frontend/ components/ TodoList.tsx Sidebar.tsx pages/ index.tsx todos.tsx lib/ api.ts
# With Claude Code Best Practices todoapp/ .claude/ # Claude Code config agents/ # Custom subagents skills/ # Domain knowledge commands/ # Slash commands hooks/ # Lifecycle scripts rules/ # Modular instructions settings.json # Team settings settings.local.json # Personal settings backend/ main.py routes/ todos.py users.py models/ todo.py user.py tests/ test_todos.py CLAUDE.md # Backend instructions frontend/ components/ TodoList.tsx Sidebar.tsx pages/ index.tsx todos.tsx lib/ api.ts CLAUDE.md # Frontend instructions .mcp.json # Managed MCP servers CLAUDE.md # Project instructions

👤 Agents

/agents

Examples: weather reporter, front-end engineer, QA engineer.

✅ Fresh working memory per run 🎯 Scoped tools, model, permission ⚡ Parallelizable
root/
├── .claude/
│   └── agents/
│       └── weather-agent.md
└── README.md

Create your first agent — /agents

The Command

Type /agents.

Opens an interactive menu — pick "Create new agent" and the CLI drafts the agent file for you.

Creates .claude/agents/<name>.md — a plain markdown file anyone can edit.

Demo

🎉 Yayyyyy! Problem solved with agents

Agent created
?

Not so fast...

🚨 Problem Statement

  1. ✅ Single source of truth

    The agent will always call the Open-Meteo API — no more source drift.

  2. ❌ Repetitively

    It’s not guaranteed that Claude will always call this agent.

Agent config fields with frontmatter

Field Description Enforced by
name Recommended Unique identifier — e.g. weather-agent harness
description Required When to invoke. Use "PROACTIVELY" for auto-invocation prompt
model haiku, sonnet, opus, or inherit (default). weather-agent uses sonnet harness
tools Allowlist of tools. weather-agent allows WebFetch, Read, Write, etc. harness
skills Skills preloaded into agent context at startup — weather-agent preloads weather-fetcher harness
maxTurns Maximum agentic turns before the agent stops. weather-agent uses 5 harness
memory Persistent memory: user, project, or local. weather-agent uses project harness
color Visual color in task list. weather-agent uses green harness

The skills: field is what makes the agent special. It preloads any-skill directly into the agent’s brain at startup — before the agent has received a single instruction.

claude-code-best-practice Tips & Tricks

Claude Code tips and tricks — part 1

claude-code-best-practice Tips & Tricks

Claude Code tips and tricks — part 2

📝 CLAUDE.md

CLAUDE.md

Knowledge you provide to the model — read every session.

✅ Loaded into every session 👥 Team-shareable via git ⚠️ 200-line problem
root/
├── CLAUDE.md
└── README.md

Create your first CLAUDE.md — /init

The Command

Type /init in Claude Code.

Claude scans your codebase and drafts a starter CLAUDE.md for you — project conventions, key patterns, common commands.

Creates CLAUDE.md in your repo root — a plain markdown file you edit and commit.

📝 CLAUDE.md — the real thing

A real CLAUDE.md file showing project rules and agent invocation instructions

claude-code-best-practice Tips & Tricks

Claude Code tips and tricks for CLAUDE.md

⚠️ CLAUDE.md problem — keep it under 200 lines

Illustration of the CLAUDE.md 200-line adherence problem

🎯 Skills

.claude/skills/

Examples: weather fetching, sorting CSV rows, generating SVG cards.

✅ Progressive disclosure — loaded on demand ♻️ Reusable across agents
root/
├── .claude/
│   └── skills/
│       └── weather-fetcher/
│           └── SKILL.md
└── README.md

Create your first skill

Write it by prompting

Just describe what you want Claude to do. It drafts the skill file for you.

"Create a skill that fetches weather from Open-Meteo for a given city."

Use Anthropic's skill-creator

Anthropic ships an official skill-creator skill. Invoke it and it walks you through generating a properly-structured skill.

Recommended — always produces the correct SKILL.md format.

⚠️ Watch out (method 1)

The prompting method sometimes creates the wrong structure. Instead of generating a folder with SKILL.md inside (e.g. weather-fetcher/SKILL.md), it creates a plain weather-fetcher.md file. The wrong form isn’t recognized as a skill by Claude Code.

🎯 Skills — a real one

A real skill file on disk — SKILL.md inside a named folder inside .claude/skills/

Skill config fields with frontmatter

Most fields control how and when the skill loads — enforced by the harness. Only description lives in prompt-land.

Field Description Enforced by
name Recommended Display name and /slash-command (defaults to directory name) harness
description Required When to invoke — used for auto-discovery. Write it clearly. prompt
argument-hint Autocomplete hint shown in the / menu (e.g. [city-name]) harness
disable-model-invocation Set true to prevent Claude from invoking this skill automatically harness
user-invocable Set false to hide from / menu — background knowledge only harness
allowed-tools Tools allowed without permission prompts when this skill is active harness
model Model to use when skill is active: haiku, sonnet, opus harness
context Set to fork to run skill in an isolated subagent context harness

Small description, full body loaded on demand — this is progressive disclosure. Claude’s main context stays lean until the skill is actually needed.

💭 Context

/context

The model's working memory — what it can see in this moment.

✅ Fresh every chat ⚠️ Dumb-zone problem ⚠️ /compact
20% of 1M tokens

🧠 Claude's Brain

Think of it like this

Imagine Claude has a brain that holds everything it's aware of right now — your question, every file it's opened, every tool result, every word it's said back to you. If a thought isn't in the brain, Claude can't use it. Simple as that.

Context window diagram showing how the 1M token limit is divided across system prompt, tools, files, and conversation
💬
Your messages Every question, clarification, and follow-up
📄
Every file Claude opens Read a 500-page PDF? All of it is in the brain now.
🔧
Every tool result Web searches, command output, screenshots — all held in memory

Two Things to Know

1. The brain is finite. It can hold about 1 million tokens — roughly 750,000 words. Big, but not infinite. 2. The brain empties at the end of every chat. When you start a new conversation, Claude remembers nothing from the last one unless you tell it again.

What Loads at Session Start

The moment you open Claude Code, certain things land in Claude's brain before you've typed a word. The rest waits in the wings — only loaded when you actually need it. This is called progressive disclosure.

Diagram showing what loads into Claude's context window at session start
📋
CLAUDE.md — full content Standing instructions, every line. Always pinned in the brain at startup.
🎓
Skills — descriptions only Claude knows which skills exist and what each does — the full content loads when the skill is actually invoked.
👤
Agents — descriptions only The roster of specialists. The full system prompt loads when the agent is dispatched.

The One-Liner

Only descriptions of skills and agents are loaded at startup — the rest is fetched on-demand. That's progressive disclosure. It keeps the brain light.

📄 Lost in the Middle

by Nelson F. Liu · Stanford University · 2023

Lost in the Middle paper illustration showing U-shaped attention curve across context positions
  1. Primacy & recency work. Information at the start of the context (what Claude sees first) and at the end (what Claude saw most recently) gets strong attention. These are the safe zones.
  2. The middle is the dumb zone. Content buried in the middle of a long context effectively disappears. The model has the tokens, but doesn't weight them strongly during generation. It's in the working memory — but not really read.
  3. Practical implication. Keep critical instructions either at the very top (in CLAUDE.md) or near the user's most recent message. A bigger context window doesn't help if your payload lands in the middle.

This is the "dumb-zone problem" the deck has been warning about — now you know where it came from.

Source: Liu et al. — Lost in the Middle: How Language Models Use Long Contexts (arXiv:2307.03172)

📘 Workflow

.claude/commands/

Repeatable step-by-step recipes — the instruction manual that makes Claude run the same playbook every time.

✅ Reproducible recipes 🔁 Same steps, every run 👥 Team-shareable via git

📋 Orchestration Workflow

Command → Agent → Skill architecture flow

📹 Workflow in action

Claude running the weather orchestration workflow end-to-end

🙏 Thank you!

Thank you

Questions? Feedback?

GitHub github.com/shanraisshan
1 / --
or Space to navigate