经 AI Skill Hub 精选评估,快捷混合代码搜索 获评「推荐使用」。这款MCP工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 7.5 分,适合有一定技术背景的用户使用。
快捷混合代码搜索 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
快捷混合代码搜索 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
# 方式一:通过 Claude Code CLI 一键安装
claude skill install https://github.com/townsendmerino/ken
# 方式二:手动配置 claude_desktop_config.json
{
"mcpServers": {
"--------": {
"command": "npx",
"args": ["-y", "ken"]
}
}
}
# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json
# 安装后在 Claude 对话中直接使用 # 示例: 用户: 请帮我用 快捷混合代码搜索 执行以下任务... Claude: [自动调用 快捷混合代码搜索 MCP 工具处理请求] # 查看可用工具列表 # 在 Claude 中输入:"列出所有可用的 MCP 工具"
// claude_desktop_config.json 配置示例
{
"mcpServers": {
"________": {
"command": "npx",
"args": ["-y", "ken"],
"env": {
// "API_KEY": "your-api-key-here"
}
}
}
}
// 保存后重启 Claude Desktop 生效
Fast hybrid code search for agents. Pure Go, single static binary, drop-in MCP-compatible with MinishLab/semble — same tool schemas, same output format, same install steps swapped to a Go binary.
Built collaboratively: most of the Go implementation written by Claude, with constraints, architectural decisions, and review discipline from @townsendmerino. The verbatim-port rule and the corpus-scale parity harness — the things that make this a faithful port instead of an approximate one — came from the human side. See How this was built.
ken is a Go port of semble. The retrieval algorithm is ported verbatim from semble's search.py + ranking/*.py; ken adds two things on top: runtime properties (single-binary distribution, no Python interpreter import on cold start, no GIL on the indexing pipeline) and measured agent-input efficiency (~44× fewer tokens than grep+Read at recall@10 on semble's diverse-query benchmark; at corpus scale — CoIR-CSN-Python's 280K files — corpus-wide grep is functionally impossible and ken's 1,296-token result is the only workable path). The honest tradeoff: ken's recall caps at 82–91% vs grep's ~99%, so exhaustive enumeration (refactors, pre-rename audits) still belongs to grep — but for "find the chunk that answers this," ken wins by 1–2 orders of magnitude on tokens. Full table in docs/BENCH.md. If you already use semble in your agent, you can swap to ken-mcp without re-prompting; the wire format is the same string semble emits.
GOOS/GOARCH cross-compiles for free; no libtokenizers.a to vendor per platform.search / find_related tool schemas, same markdown-string output format, install snippets adapted from semble's README.docs/BENCH.md.transformers.AutoTokenizer on an 11k-input adversarial+repo corpus (scripts/parity_dump.py + internal/embed/parity_test.go).ken search from a tiny index returns in ~10–20 ms on a Mac).The library form of ken-mcp lets SDK authors ship docs as a single static MCP server binary. Write ~20 lines of main.go, //go:embed your docs/ and the Model2Vec model, go build — push a binary to a GitHub release. Users brew install, add one line to their agent config, and their coding agent has high-quality local retrieval over your SDK's docs. No backend, no vector DB to operate, no network egress per query, no "is the cache stale" question — the binary IS the corpus, version-pinned by build artifact.
package main
import (
"context"
"embed"
"io/fs"
"log"
"os"
"github.com/townsendmerino/ken/mcp"
_ "github.com/townsendmerino/ken/internal/chunk/markdown"
)
//go:embed docs/*.md
var docsFS embed.FS
//go:embed model/tokenizer.json model/config.json model/model.safetensors
var modelFS embed.FS
func main() {
docsSub, _ := fs.Sub(docsFS, "docs")
modelSub, _ := fs.Sub(modelFS, "model")
if err := mcp.Run(context.Background(), docsSub, mcp.Options{
Mode: "hybrid",
ChunkerName: "markdown",
ModelFS: modelSub,
LogWriter: os.Stderr,
}); err != nil {
log.Fatal(err)
}
}
cmd/ken-mcp-docs/ is the canonical worked example — it bakes ken's own docs/*.md and the Model2Vec model into a 74 MB static binary built via scripts/build-docs-mcp.sh. Design and rationale: ADR-016.
go install github.com/townsendmerino/ken/cmd/ken@latest go install github.com/townsendmerino/ken/cmd/ken-mcp@latest
```bash
```bash
| Variable | Default | Purpose |
|---|---|---|
KEN_MCP_DEFAULT_REPO | (unset) | Pre-indexed source; lets tools omit the repo arg. |
KEN_MCP_MODE | hybrid | bm25 / semantic / hybrid. Auto-downgrades to bm25 with a stderr warning if the model dir is unreachable. |
KEN_MCP_MODEL_DIR | (unset) | Path to a Model2Vec snapshot containing model.safetensors. Empty ⇒ bm25-only. |
KEN_MCP_CHUNKER | regex | regex / treesitter / line / markdown. See ["Choosing a chunker"](#choosing-a-chunker). |
KEN_DB_DSN | (unset) | Database DSN. Postgres (postgres://... / postgresql://...), SQLite (sqlite:///abs/path.db, sqlite://./rel/path.db, sqlite3://...), or MySQL (mysql://user:pass@host:3306/db, native user:pass@tcp(host:3306)/db, or user:pass@unix(/sock)/db) — engine routing dispatches on the scheme (or @tcp(/@unix( for the native MySQL form). Enables [Tier 2 DB indexing](#tier-2--live-postgres-introspection-ken_db_dsn). Requires KEN_MCP_DEFAULT_REPO to be a local path. |
KEN_DB_SAMPLE_ROWS | 0 | Rows per table to sample. **Default 0 means schema-only.** See the [PII stance](#pii-stance-documentation--sane-defaults) before enabling. |
KEN_DB_REINDEX_INTERVAL | (off) | Go duration (5m, 1h). Background refresh cadence. Off by default — restart or SIGHUP to refresh. |
KEN_DB_LISTEN | 0 | 1 / true / yes activates Postgres LISTEN/NOTIFY push notifications (v0.8.0). Requires the one-time setup script: ken-mcp print-listen-script \| psql $KEN_DB_DSN. Non-Postgres DSNs log debug + no-op. See [LISTEN/NOTIFY push notifications](#listennotify-push-notifications-v080-postgres-only). |
KEN_DB_SCHEMAS | (unset) | Comma-separated allow-list of schema names (Postgres) / database names (MySQL). Example: public,billing. Default exclusions (pg_catalog, information_schema, mysql, performance_schema, sys) always still apply. SQLite ignores. See [Filtering indexed schemas](#filtering-indexed-schemas). |
KEN_DB_EXCLUDE_SCHEMAS | (unset) | Comma-separated deny-list. Extends (does not replace) the default exclusions. Example: audit,cron,legacy. When set alongside KEN_DB_SCHEMAS, the allow-list wins (stderr warn). SQLite ignores. |
KEN_SQL_NO_AUTO_MIGRATIONS | (off) | 1 / true / yes disables v0.7.1 Tier-1 migration-history folding (restores v0.7.0 per-file behavior). Useful when you maintain a canonical schema/current.sql and don't want migration history surfaced as folded chunks. |
KEN_MCP_CACHE_SIZE | 16 | LRU bound on the repo→Index cache. |
KEN_MCP_LOG_LEVEL | warn | debug / info / warn / error. All logs go to stderr; **stdout is the JSON-RPC channel** ([details](docs/DESIGN.md#hard-rule--stdoutstderr-contract)). |
SDK authors using mcp.Run (the v0.6.0 embedded-corpus entrypoint) can wire Tier 2 DB support — schema introspection, optional LISTEN/NOTIFY, optional interval reindex, and the reindex_db MCP tool — via the new opt-in mcp/db package:
package main
import (
"context"
"log"
"os"
"time"
"github.com/townsendmerino/ken/mcp"
mcpdb "github.com/townsendmerino/ken/mcp/db"
)
func main() {
ctx := context.Background()
// Opt-in: only SDK authors who want DB support import mcp/db.
refresher, err := mcpdb.Setup(ctx, mcpdb.Config{
DSN: os.Getenv("MY_DB_DSN"),
SampleRows: 0,
ReindexInterval: 5 * time.Minute,
EnableListen: true, // requires one-time `mcpdb.ListenNotifyScript | psql $DSN` setup
})
if err != nil {
log.Fatal(err)
}
// refresher is nil when MY_DB_DSN is unset → opts.DB stays nil →
// reindex_db tool NOT registered (the v0.6.0 docs-only behavior).
// When non-nil, mcp.Run calls refresher.Start internally and
// defers the returned cleanup.
if err := mcp.Run(ctx, myEmbeddedDocsCorpus, mcp.Options{
Mode: "hybrid",
ChunkerName: "markdown",
DB: refresher, // *mcpdb.Refresher satisfies mcp.DBIntegration
}); err != nil {
log.Fatal(err)
}
}
v0.6.0 binary-size contract preserved. SDK authors who DON'T import mcp/db get a binary identical in dep-tree shape to v0.7.2's mcp.Run use case — no pgx, no SQLite, no MySQL driver, no internal/db in the link graph. The opt-in package boundary is enforced at CI time by TestBinary_MCPPackageStaysDBFree, which shells out to go list -deps github.com/townsendmerino/ken/mcp and fails if any DB driver path appears.
SDK authors who want print-listen-script in their own CLI can grab the embedded SQL script from mcpdb.ListenNotifyScript (a re-export of internal/db.ListenNotifyScript) without depending on the internal/ package:
if len(os.Args) > 1 && os.Args[1] == "print-listen-script" {
_, _ = io.WriteString(os.Stdout, mcpdb.ListenNotifyScript)
return
}
Chunk integration is end-to-end. Calling reindex_db from an agent against an mcp.Run + mcp/db.Setup binary runs the introspection AND makes the new DB chunks searchable in the agent's next search / find_related call. The pipeline: mcp.Run wraps the embedded *search.Index in atomic.Pointer[search.Index]; mcp/db.Refresher.Start (called by mcp.Run on startup) wires the swap callback to *search.Index.WithExtraChunks + atomic-pointer store; each refresh rebuilds against the original corpus + the latest DB chunks. cmd/ken-mcp continues to use *WatchedIndex.SetExtraChunks for its fsnotify-rooted path; the SDK-author + CLI surfaces converge on the same Refresher + reindex_db semantics. See ADR-020 Part 3 for the full design + the rejected alternatives.
A single externally-reproducible NDCG@10 number on CoIR's CodeSearchNet-python task, independent of semble's own benchmark — gives readers a comparable anchor against published code-IR baselines.
Result (v0.2.0, 1000-query subsample, regex chunker):
| Mode | NDCG@10 |
|---|---|
| bm25 | 0.8743 |
| semantic | 0.7405 |
| **hybrid (default)** | **0.7839** |
Reproduce:
python scripts/bench_coir.py # ~45 s download + 280k corpus files
KEN_COIR_QUERY_LIMIT=1000 go test -tags=bench ./bench/ndcg/ -run TestCoIR -v # ~13 min
A nuance worth surfacing up front: on CSN-Python, BM25 beats hybrid by 0.09 — opposite of what semble's bench shows. CSN-Python's queries (as CoIR re-hosts the dataset) are full Python function sources, and the relevant document for each query is the docstring extracted from that same function. Because the docstring lives inside the function source as a literal substring (the function's own """...""" block), any lexical retriever with identifier-aware tokenization wins — BM25 has the answer string as input. ken's α=0.5 RRF fusion then drags the hybrid number down by averaging in the weaker semantic ranking. Not a ken bug; it's a structural artifact of how CoIR reframed CodeSearchNet for retrieval, and doesn't generalize to natural NL-to-code distributions. Detailed empirical findings and the comparison to potion-code-16M's published aggregate are in docs/BENCH.md.
| Property | semble | ken |
|---|---|---|
| Language | Python | Go |
| Distribution | uvx / pip install | single static binary |
| Cold start | (Python interpreter + import numpy + model load: ~500 ms per [semble README](https://github.com/MinishLab/semble#benchmarks)) | ~10–20 ms ken search over a tiny index (measured, M2 Mac) |
| Index this repo (542 chunks, hybrid w/ model) | (not measured locally) | **0.45 s** (measured) |
Index /tmp/semble checkout (hybrid w/ model) | (not measured locally) | **1.80 s** (measured) |
| Index this repo (BM25 only) | (not measured locally) | **0.06 s** (measured) |
| Retrieval algorithm | reference implementation | verbatim port (constants and pipeline order ported from search.py + ranking/*.py) |
| NDCG@10 on semble's benchmark | 0.854 ([semble README](https://github.com/MinishLab/semble#benchmarks)) | **0.842 hybrid** (gap 0.012, full corpus 63 repos × 1251 queries)† |
| NDCG@10 on CoIR-CSN-Python (external) | (not measured; semble doesn't run this bench) | **0.8743 bm25 / 0.7839 hybrid** ([see why](#benchmarks--external-reference-coir-csn-python))†† |
| Median tokens to recall@10 on agent queries | (not measured; semble doesn't run this bench) | **4,269 tok @ 82% recall** on semble NL queries — vs grep+Read's 189,591 tok @ 99.9% (44× cheaper at 17 pp lower recall)††† |
| MCP server | yes | yes — drop-in compatible (same tool schemas, same wire format) |
| Binary size | n/a (Python env) | ken ~32 MB · ken-mcp ~36 MB (tree-sitter grammars dominate — see [Choosing a chunker](#choosing-a-chunker)) |
Requires huggingface-cli for model | yes | **no** — ken download-model fetches direct from HF (or skip and use --mode bm25) |
† Measured at v0.1.0 / v0.2.0 against semble's published benchmark (63 repos, 1251 queries, semble's own benchmarks.metrics.ndcg_at_k + target_rank). Reproduce: see docs/BENCH.md. Ablation breakdown vs semble's published raw retrieval numbers: > > | Mode | semble (raw) | ken regex (default) | ken treesitter (opt-in) | > |---|---:|---:|---:| > | Semantic only (potion-code-16M) | 0.650 | 0.647 | — | > | BM25 only | 0.675 | 0.624 | 0.621 | > | Hybrid (full ranker) | 0.854 | 0.842 | 0.838 | > > The semantic-raw match within 0.003 isolates and validates the embedding + tokenizer + ANN port. The BM25 tokenizer was also re-aligned to a verbatim port of semble's tokens.py (snake-case compound preservation, ASCII-only identifier extraction, compound-first emission order). The v0.2.0 tree-sitter chunker (--chunker=treesitter via gotreesitter) trades NDCG per-language without net movement — clear wins on Kotlin / Zig / TypeScript / Java / PHP, losses on Python / Rust / C / Lua / Scala — so the default chunker stays regex and treesitter is opt-in. See "Choosing a chunker" for the per-language recommendation and docs/DECISIONS.md ADR-011 for the full rationale.
†† CoIR-CSN-Python numbers reported separately because they tell a different story than semble's bench: on CSN, BM25 beats hybrid by ~0.09 due to a substring-leak artifact in how CoIR reframes the CodeSearchNet dataset (queries are Python function sources; documents are docstrings extracted from those same functions, so the answer is a literal substring of the query). See the "Benchmarks — external reference" section and docs/BENCH.md for the corrected explanation. semble's bench is the verbatim-port confirmation; CoIR-CSN is the externally-reproducible anchor against published code-IR baselines but is read as a dataset-construction case study, not as evidence about ken's hybrid retrieval on natural NL-to-code queries.
††† Measured at v0.3.0 against semble's 63-repo benchmark (914 NL queries from semble's 1,251-query corpus, ranked by ken's regex chunker, K=10). The honest framing: ken trades ~17 percentage points of recall for ~44× fewer agent-input tokens. Exhaustive enumeration (refactors, pre-rename audits) still belongs to grep — ken is for "find the chunk that answers this." Full per-query-class table (symbol + NL) and the methodology + caveats are in docs/BENCH.md.
semble timings cited above are from semble's own README "Benchmarks" section; ken's are measured on the included testdata/repo polyglot fixture and on a sibling shallow clone of /tmp/semble. Cold-start was timed by /usr/bin/time -p ken search testdata/repo "validate" -k 1 --mode bm25 over three trials (M2 MacBook Air, Go 1.26.3, darwin/amd64 build under Rosetta).
高性能的代码搜索工具,值得关注
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:快捷混合代码搜索 的核心功能完整,质量良好。对于Claude Desktop / Claude Code 用户来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | ken |
| 原始描述 | 开源MCP工具:Fast hybrid code search for agents. Pure Go, drop-in MCP-compatible with semble.。⭐14 · Go |
| Topics | mcpagentsbm25code-searchembeddingsgo |
| GitHub | https://github.com/townsendmerino/ken |
| License | MIT |
| 语言 | Go |
收录时间:2026-05-25 · 更新时间:2026-05-30 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端