Graff 是 AI Skill Hub 本期精选MCP工具之一。综合评分 8.0 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
Graff 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
Graff 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
# 方式一:通过 Claude Code CLI 一键安装
claude skill install https://github.com/justrach/codegraff
# 方式二:手动配置 claude_desktop_config.json
{
"mcpServers": {
"graff": {
"command": "npx",
"args": ["-y", "codegraff"]
}
}
}
# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json
# 安装后在 Claude 对话中直接使用 # 示例: 用户: 请帮我用 Graff 执行以下任务... Claude: [自动调用 Graff MCP 工具处理请求] # 查看可用工具列表 # 在 Claude 中输入:"列出所有可用的 MCP 工具"
// claude_desktop_config.json 配置示例
{
"mcpServers": {
"graff": {
"command": "npx",
"args": ["-y", "codegraff"],
"env": {
// "API_KEY": "your-api-key-here"
}
}
}
}
// 保存后重启 Claude Desktop 生效
<p align="center"> <img src="codegraff.png" alt="codegraff" width="360"> </p>
<p align="center"> <strong>An AI that actually does the work. Not just talks about it.</strong> </p>
<p align="center"> Install it on your Mac or Linux machine, sign in with the AI subscription you <em>already have</em>, and hand it real tasks. graff writes and runs code, automates the boring stuff, digs through your files, researches the web, and runs its own experiments, on its own, until the job is done.<br/> <strong>You don't chat with it. You give it work.</strong> </p>
<p align="center"> <img alt="macOS · Linux" src="https://img.shields.io/badge/macOS%20·%20Linux-555"> <img alt="One binary, 1.7 MB" src="https://img.shields.io/badge/one%20binary-1.7%20MB-44cc11"> <img alt="Zero dependencies" src="https://img.shields.io/badge/dependencies-0-44cc11"> <img alt="Built in Zig 0.16" src="https://img.shields.io/badge/built%20in-Zig%200.16-f7a41d?logo=zig&logoColor=white"> </p>
curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | sh
<p align="center"><sub>Prefer a window? Grab the <a href="#install">desktop app</a>. Then just run <code>graff</code> and tell it what you need.</sub></p>
usage:
graff [flags] start the REPL
graff [-p] "prompt" one-shot: run the prompt, print the answer, exit
graff login get a codegraff key (device-code OAuth)
graff login codex [--refresh] ChatGPT/Codex OAuth login (PKCE)
graff key set <provider> <key> store a key (macOS Keychain, else 0600 file)
graff key list show which providers have keys
graff mcp add <name> -- <cmd> add an MCP server to .mcp.json
graff mcp list configured MCP servers
graff --schema print the machine-readable interface (SDK codegen)
flags:
--model <name> start on this model (same fuzzy resolution as /model)
--yolo skip all permission prompts for the session
-p, --print one-shot print mode (answer on stdout, tool progress on stderr)
--timing show per-tool wall-clock on result lines (✓ (312ms) …)
--cost show running session spend in the prompt ([model · 12k tok · $0.0042])
--json structured stdio protocol (JSON in, JSONL events out, SDK transport)
-h, --help usage
-V, --version version
Unknown flags are an error (with a pointer to --help), a missing --model value is an error, and --help/--version are handled before subcommand dispatch, so graff login --help prints usage instead of starting an OAuth flow. With no key configured at all, startup fails with the three quickest fixes spelled out rather than a bare env-var list.
One-shot mode makes the harness scriptable without the SDK: `graff -p "how many TODOs in src/?"` runs a full agentic turn (tools included), prints only the final answer on stdout (progress lines go to stderr), and exits non-zero on failure. There's no human to ask, so the permission gate denies anything not already allowed. Pre-approve commands in .harness/settings.json or pass --yolo.
---
graff is scriptable from your own code. graff --json is a structured stdio protocol (JSON requests in, JSONL events out; ask_user is answered with a structured {"type":"answer","text":"...","cancelled":false} line) and graff --schema prints the machine-readable interface, and the TypeScript and Python SDKs in sdk/ are auto-generated from that schema, so they never drift from the binary. On every release tag a GitHub Action rebuilds, regenerates, fails if the committed SDKs are stale, and publishes to npm (@graff-new/sdk) and PyPI (simple-harness-sdk).
```python
<details> <summary><strong>Tools & the permission gate</strong></summary>
<br/>
| Tool | Kind | Implementation |
|---|---|---|
bash | built-in | std.process.run → /bin/sh -c, stdout+stderr+exit code |
read_file | built-in | Io.Dir.cwd().readFileAlloc (256 KB cap) |
edit_file | built-in | exact string replace; unique match required unless replace_all |
write_file | built-in | Io.Dir.cwd().writeFile |
codedb | built-in | shells out to [codedb](https://github.com/justrach/codedb): read-only code-intel (search/symbol/callers/outline/…) |
subagent | built-in | this same agent loop, recursively (root agent only) |
workflow | built-in | phases of parallel subagents; {{prev}} carries results forward (root only) |
todo_write/_read | meta | mutate/read the agent's own task list |
ask_user | meta | ask the human a question; their reply returns as the result |
attempt_completion | meta | carry the final answer out; ends the turn |
mcp__<server>__* | MCP | tools discovered from .mcp.json servers (see below) |
Meta tools act on the agent or the conversation, not the outside world, so the orchestrator handles them inline rather than on a pool thread. ask_user + attempt_completion make the human↔agent conversation fully tool-mediated: the agent asks via a tool, the person's reply comes back as that tool's result, and the agent finishes via another tool. In /strict mode the model is forced to call a tool every turn, so every message is a tool call or tool result.
Permission gate. The gate (gateTool) covers bash, write_file, edit_file, and MCP tool calls. A call that isn't pre-approved prompts at the REPL: [y]es once · [a]lways allow "<key>" (saved) · [n]o. The approval key is the command's first word for bash, the tool name for writes/MCP. "Always" persists: it's written to .harness/settings.json in the cwd ({"allow": ["touch", "write_file", …]}) and loaded back on every launch in that project. Edit the file by hand to revoke or pre-approve. A small seed allowlist (read-only basics like ls/cat/rg, plus zig build/zig fmt and git status/diff/log/show) never prompts; find is deliberately excluded (its -exec/-delete make it an exec tool). Commands containing chaining, pipes, redirection, substitution, or newlines never match a prefix: they always prompt. Approving an interpreter as a bash word (python3, node, …) prints a heads-up that it grants arbitrary code execution.
Path confinement. read_file/write_file/edit_file are confined to the working-directory subtree: no absolute paths, no ... This is structural (not bypassed by /yolo): read_file /etc/shadow and write_file ../../x are refused with an error.
bash is cwd-locked by default too. A seed/approved command auto-runs only when all its path arguments stay in the cwd (escapesCwd rejects absolute, ~, and .. tokens). So cat local.txt runs free but cat /etc/passwd falls through to a prompt at the root (you can still approve it per-call) and is denied for subagents. /yolo lifts this.
Subagents have no stdin, so they're gated structurally, not by prompt: bash is allowlist-only (unapproved → denied), file writes are allowed but path-confined, and MCP isn't exposed to them at all. /yolo turns the prompt gate off (path confinement stays).
</details>
<details> <summary><strong>MCP servers</strong></summary>
<br/>
The harness is an MCP client (src/mcp.zig). Drop a .mcp.json in the working directory and it spawns each server, speaks JSON-RPC 2.0 over stdio, discovers their tools, and offers them to the model namespaced mcp__<server>__<tool>:
{
"mcpServers": {
"codedb": { "command": "codedb", "args": ["mcp", "."] },
"everything": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-everything"] }
}
}
Pointing it at codedb mcp . gives the agent 22 structural code-intelligence tools: pure-Zig client to pure-Zig server, zero dependencies on either side. /mcp lists what connected; /mcp add <name> <cmd> [args…] connects a server live and saves it to .mcp.json. From a shell, use the Codex-style form graff mcp add <name> [--env KEY=VALUE ...] -- <cmd> [args…]; for example, graff mcp add context7 -- npx -y @upstash/context7-mcp. graff mcp lists the servers already saved in .mcp.json. Workspace servers auto-connect only with --yolo (trusted) or per-session consent.
One known companion is exempt from the workspace gate: if the muonry binary is on PATH (the fast code-intelligence suite), the harness auto-connects it in every workspace (it's a user-installed tool, not arbitrary repo config) and injects its usage note so the model prefers mcp__muonry__read/search over native navigation, falling back to the native tools whenever a call fails. Opt out with {"skills": {"muonry": false}} in .harness/settings.json.
</details>
<details> <summary><strong>ultracode & workflows: multi-agent fan-out</strong></summary>
<br/>
ultracode: the multi-agent codeword. Put the word ultracode anywhere in a message and the harness augments that turn with a note steering the model into multi-agent workflow mode: it prints `⚡ ultracode: multi-agent workflow mode engaged, records an ultracode` trace event, and asks the model to fan the work out across phases of parallel subagents (then synthesize) via the workflow tool rather than doing it solo. It's a per-turn toggle: no flag, no mode to remember, just the keyword.
Workflows. Dynamic workflows as data (inspired by pi-dynamic-workflows, minus the JS sandbox): the model calls the workflow tool with a JSON plan of up to 5 sequential phases, each holding up to 8 tasks that run in parallel as isolated subagents. From phase 2 on, {{prev}} in a task prompt is replaced with the labeled results of the previous phase (auto-appended if omitted), and the final phase's results return to the root agent. Good for fan-out + synthesis: audits, multi-perspective review, parallel research.
</details>
<details> <summary><strong>Subagents</strong></summary>
<br/>
A subagent is just a tool whose executor is the same Agent loop with a fresh history, its own arena, and a subagent-specific system prompt (execSubagent). Because tool calls already fan out via io.async, the model spawning three subagents in one response gets three agent loops running concurrently, each making its own HTTPS calls through the shared (thread-safe) std.http.Client. Subagents inherit the parent's provider, so deepseek subagents work the same as claude ones.
- Depth capped at one level: subagents don't get the subagent tool. - Subagents don't share the root agent's context, so the orchestrator must put everything needed into the prompt (the tool description tells it so). - Progress lines ([label] ⚙ bash …) go to stderr via std.debug.print, which locks stderr and is safe from pool threads.
</details>
<details> <summary><strong>Sessions & compaction</strong></summary>
<br/>
Session persistence. /save [name] writes the conversation (messages + provider + strict flag) to <name>.session.json in the cwd (default name last); /resume [name] restores it (provider, model, and full history) in any later run, and /sessions lists the saved ones. The stored message array is already the provider-native wire shape, so resume is a verbatim restore and works across providers (including codex's Responses-format items).
Compaction, client-side, provider-agnostic:
1. Every response's usage is recorded (input+output+cache tokens for Anthropic, total_tokens for OpenAI) and shown in the prompt. 2. Past the model's compaction threshold, 80% of its context window, from a comptime model table (/models prints it: 800k for the 1M-context models, 160k for claude-haiku-4-5, 160k fallback for unknown models), or on /compact, the harness sends the history plus a handoff instruction with no tools offered, so the model must reply with a text summary covering goals, decisions, file paths, code state, and pending work. 3. History is replaced by a single user message embedding that summary, and the token counter resets.
If the summary request fails, history is left untouched.
</details>
<details> <summary><strong>KV-cache efficiency (Manus lessons)</strong></summary>
<br/>
Following Manus's context-engineering notes, the loop is built to keep the prompt prefix cacheable: the system prompt is stable (no per-request timestamps), history is strictly append-only, and tool definitions are rendered once at comptime so their order never shifts. On the real Anthropic API the harness also sets an explicit cache_control breakpoint. Cache reads are surfaced: recordUsage parses cache_read_input_tokens (Anthropic) and prompt_cache_hit_tokens / prompt_tokens_details.cached_tokens (OpenAI/DeepSeek), and every api trace line carries a cache_read_tokens field so you can see the hit rate in harness.trace.jsonl.
The one deliberate exception is set_system_prompt (--json protocol / SDK setSystemPrompt): the system prompt is the first token of the cached prefix, so mutating it, even appending, invalidates the KV-cache for the entire conversation and the next request re-reads everything at full input price. Treat it as a task-boundary operation: prefer the spawn-time --system-prompt/--append-system-prompt flags, and never flip the prompt back and forth inside an agent loop.
</details>
<details> <summary><strong>Tracing & telemetry</strong></summary>
<br/>
Tracing: the harness can debug itself. Every API round trip (latency, request/response bytes, context tokens) and every tool execution (duration, result size, errors, root-vs-subagent) is appended as one JSON line to harness.trace.jsonl in the cwd, truncated at startup so it always covers the current session. The system prompt tells the agent the file exists, so "profile yourself" or "why was that slow?" makes the agent read its own trace and answer from data. /trace toggles it.
Telemetry, pseudonymous, opt-out, on by default. Every build (release, source, and dev) bakes in a default OTLP endpoint (pass -Dtelemetry-endpoint="" to disable it at build time), so by default a session ships best-effort OTLP/HTTP JSON POSTs to <endpoint>/v1/logs (at exit, plus mid-session batches). Opt out any time with --no-telemetry or GRAFF_NO_TELEMETRY=1; setting OTEL_EXPORTER_OTLP_ENDPOINT (or GRAFF_OTEL_ENDPOINT) redirects it to your own collector instead.
It's pseudonymous, not anonymous: records carry a random per-install id (~/.simple-harness-install-id, generated with io.random, not derived from your name, host, or user) plus your request IP, version, OS, and arch. The payload is counts, hashes, and tool names: a session summary (duration, turns, API/tool call+error counts, models used, workflow/ultracode counts), per-workflow and per-error records, and per-turn/score records keyed by a one-way system-prompt fingerprint + prompt_sha hashes with a tool-name sequence (e.g. read_file, bash, edit_file). It does not send your prompts, your code, file contents, file paths, or tool arguments. Your input is never an argument to any telemetry call.
Fleet / evolution signals (fleet:propose|submit|elite_pull, the agent-evolution fitness loop) ride the same channel and have a separate opt-out: GRAFF_FLEET=off or /fleet off. They're hashes and labels, with one exception. fleet:propose sends an agent's system-prompt / persona text (≤8192 chars: the evolved "genome"; graff's own text for built-in agents, your text for a custom agent or inline override). Error details are capped at 200 chars. The SDKs tag their child harness with HARNESS_CLIENT=sdk-ts|sdk-py and a separate id (~/.simple-harness-sdk-id). A flush failure never disturbs the session.
</details>
<details> <summary><strong>Project instructions (AGENTS.md / CLAUDE.md)</strong></summary>
<br/>
At startup the harness reads the first of AGENTS.md, HARNESS.md, or CLAUDE.md it finds in the working directory and appends it to the root system prompt (subagents keep the lean prompt). It prints `loaded project instructions from AGENTS.md (N bytes)`. Because the system prompt stays frozen for the session, this is KV-cache-friendly. Drop conventions, codewords, or do/don't rules in AGENTS.md and the harness picks them up like any real coding agent.
</details>
<details> <summary><strong>Install details, keys & SDKs</strong></summary>
<br/>
install.sh compiles graff (ReleaseFast) and installs it to ~/bin (override with HARNESS_DIR=); it builds the current checkout, or clones the repo if run standalone. It detects the platform (Windows → WSL hint), checks for Zig 0.16, and ends with a PATH check. Alternatively, run in place:
zig build run # or: ./zig-out/bin/graff
zig build test # the test suite (also run by CI, .github/workflows/ci.yml)
Releases & verification. Tagged releases ship a prebuilt darwin-arm64 binary that is codesigned with a Developer ID certificate and notarized by Apple, so it runs without Gatekeeper prompts. Verify a download:
```sh codesign --verify --strict --verbose=2 graff # → valid on disk; satisfies its Designated Requirement codesign -dv --verbose=4 graff 2>&1 | grep Authority
<p align="center"> <img src="comparison.png" alt="graff vs Claude Code vs Codex: ~20x cheaper ($0.022 vs $0.51 vs $0.42 per task), ~25 MB vs ~410 MB vs ~206 MB peak memory, 4.4s vs 8.9s one-shot gpt-5.5 latency" width="860"> </p>
Run the same job on graff, Claude Code, and Codex (three read-only questions about this repo, plus an 8-trial latency test), and here is what it means for you:
Your AI bill is a fraction. graff runs the same task on whatever model fits your budget. On deepseek-v4-pro it averaged $0.022 per task, against Claude Code's $0.51 (Opus 4.8) and Codex's $0.42 (gpt-5.5). That is roughly 20× cheaper, because Claude Code only runs Claude and Codex only runs GPT, while graff runs deepseek, kimi, glm, grok, minimax, gpt, claude, and more. On the same model the token usage is comparable, so the win is the freedom to pick a cheaper one, not a token trick.
It stays out of your way. graff is one 1.7 MB Zig binary. In these runs it used about 25 MB of memory for focused work (more when it reads a lot of code), against Claude Code's steady ~410 MB (Node) and Codex's ~206 MB (Rust). Leave it running next to everything else and your laptop won't notice.
Scripts and CI finish in half the time. For one-shot runs (graff -p, the SDKs, a CI step), graff completed a gpt-5.5 turn in 4.4 s versus Codex's 8.9 s on the identical ChatGPT endpoint, on every single trial. That is graff's near-instant startup beating a heavier per-call launch. In a long interactive session the startup amortizes and both settle to model latency, so this is a one-shot and automation win, not a blanket "graff is faster."
<sub><b>Method:</b> macOS, same machine, read-only code questions on this repo. Cost is each tool's own reported usage at <a href="https://codegraff.com/docs/models">codegraff gateway prices</a>; memory is peak RSS via <code>/usr/bin/time -l</code>; latency is 8 concurrent graff/Codex pairs on a tool-free prompt with reasoning effort matched. Your numbers will vary with the task, the model, and the network. Reproduce it yourself: <a href="benchmarks/">benchmarks/</a>.</sub>
---
高质量的自动化编码工具
该工具使用 NOASSERTION 协议,商用场景请仔细阅读协议条款,必要时咨询法律意见。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
📄 NOASSERTION — 请查阅原始协议条款了解具体使用限制。
经综合评估,Graff 在MCP工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | codegraff |
| 原始描述 | 开源MCP工具:graff — a fast agentic coding harness in Zig: multi-provider, MCP, workflows, DG。⭐8 · Zig |
| Topics | mcpzig代理编码 |
| GitHub | https://github.com/justrach/codegraff |
| License | NOASSERTION |
| 语言 | Zig |
收录时间:2026-06-29 · 更新时间:2026-06-29 · License:NOASSERTION · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端