能力标签
MCP工具
🔌
MCP工具

MCP工具

基于 Go · 让 AI 助手直接操作你的系统与工具
英文名:slimemold
⭐ 7 Stars 🍴 1 Forks 💻 Go 📄 Apache-2.0 🏷 AI 7.5分
7.5AI 综合评分
mcpargument-miningclaude-codeepistemicepistemology
✦ AI Skill Hub 推荐

经 AI Skill Hub 精选评估,MCP工具 获评「推荐使用」。这款MCP工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 7.5 分,适合有一定技术背景的用户使用。

📚 深度解析

MCP工具 是一款基于 MCP(Model Context Protocol)标准协议的 AI 工具扩展。MCP 协议由 Anthropic 开发并开源,旨在建立 AI 模型与外部工具之间的标准化通信接口,目前已被 Claude Desktop、Claude Code、Cursor 等主流 AI 工具采纳。

通过安装 MCP工具,你的 AI 助手将获得额外的工具调用能力,可以用自然语言直接操控该工具的功能,无需学习复杂的命令行语法。MCP 工具的核心价值在于"一次配置,永久增强"——配置完成后,每次与 AI 对话时都可以无缝调用这些工具。

在技术实现上,MCP 工具通过标准的 JSON-RPC 协议与 AI 客户端通信,工具的功能以"工具列表"的形式暴露给 AI 模型,AI 可以按需调用。MCP工具 提供了结构化的工具调用接口,使 AI 模型能够精确地理解和使用每个功能点,显著降低 AI 在工具使用上的错误率。

与传统的 API 集成相比,MCP 工具的优势在于无需编写代码——用户只需在配置文件中添加几行 JSON,即可让 AI 获得全新能力。AI Skill Hub 将 MCP工具 评为 AI 评分 7.5 分,属于同类工具中的优质选择。

📋 工具概览

MCP工具 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。

GitHub Stars
⭐ 7
开发语言
Go
支持平台
Windows / macOS / Linux(跨平台)
维护状态
轻量级项目,按需更新
开源协议
Apache-2.0
AI 综合评分
7.5 分
工具类型
MCP工具
Forks
1

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理,如需查看完整原始文档请访问底部「原始来源」。

MCP工具 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。

📌 核心特色
  • 通过标准 MCP 协议与 Claude、Cursor 等主流 AI 客户端深度集成
  • 提供结构化工具调用接口,显著降低 AI 集成复杂度
  • 支持 Claude Desktop 和 Claude Code 无缝接入,开箱即用
  • 可与其他 MCP 工具组合叠加,构建完整 AI 工作站
  • 轻量无侵入设计,不影响现有系统架构
🎯 主要使用场景
  • 在 Claude Desktop 对话中直接调用本地工具,实现 AI 与系统的深度联动
  • 通过自然语言驱动复杂的多步骤自动化任务,代替繁琐手动操作
  • 将多个 MCP 工具组合使用,构建个人专属 AI 工作站
以下安装命令基于项目开发语言和类型自动生成,实际以官方 README 为准。
安装命令
# 方式一:通过 Claude Code CLI 一键安装
claude skill install https://github.com/justinstimatze/slimemold

# 方式二:手动配置 claude_desktop_config.json
{
  "mcpServers": {
    "mcp--": {
      "command": "npx",
      "args": ["-y", "slimemold"]
    }
  }
}

# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json
📋 安装步骤说明
  1. 确认已安装 Node.js(v18 或以上版本)
  2. 打开 Claude Desktop 或 Claude Code 的 MCP 配置文件
  3. 按「交给 Agent 安装 → Claude Desktop」标签中的 JSON 配置填入 mcpServers 字段
  4. 保存配置文件并重启 Claude 客户端
  5. 重启后,在对话中即可使用本工具
以下用法示例由 AI Skill Hub 整理,涵盖最常见的使用场景。
常用命令 / 代码示例
# 安装后在 Claude 对话中直接使用
# 示例:
用户: 请帮我用 MCP工具 执行以下任务...
Claude: [自动调用 MCP工具 MCP 工具处理请求]

# 查看可用工具列表
# 在 Claude 中输入:"列出所有可用的 MCP 工具"
以下配置示例基于典型使用场景生成,具体参数请参照官方文档调整。
配置示例
// claude_desktop_config.json 配置示例
{
  "mcpServers": {
    "mcp__": {
      "command": "npx",
      "args": ["-y", "slimemold"],
      "env": {
        // "API_KEY": "your-api-key-here"
      }
    }
  }
}

// 保存后重启 Claude Desktop 生效
📑 README 深度解析 真实文档 完整度 52/100 查看 GitHub 原文 →
以下内容由系统直接从 GitHub README 解析整理,保留代码块、表格与列表结构。

Slimemold

CI Go Report Card License

A sycophantic tool for preventing worse sycophancy. For Claude Code.

The model agrees with your unsourced claims. Then it agrees with the structural analysis showing your claims are unsourced. Then it enthusiastically agrees you should verify them. It's agreement all the way down.

If you just want to install it: skip to Installation.

---

Installation

Requires Claude Code, Go 1.26+, and an Anthropic API key.

go install github.com/justinstimatze/slimemold@latest
export ANTHROPIC_API_KEY=sk-ant-...
slimemold init

slimemold init writes to ~/.claude/settings.json globally: the Stop and UserPromptSubmit hooks, plus the slimemold MCP server entry. The MCP server's initialization instructions carry the behavioral contract — what slimemold is, that its hook output is legitimate, and how to respond to findings — so it travels with the tool without per-project setup. Every project on the machine picks it up automatically. Init merges with existing configs and will not overwrite anything already there. Restart Claude Code to connect.

The hook fires every 3rd assistant response by default. Each extraction makes one Sonnet API call (~$0.01-0.05 depending on transcript length). Set SLIMEMOLD_INTERVAL to change the frequency:

export SLIMEMOLD_INTERVAL=3    # every 3rd turn (more aggressive)
export SLIMEMOLD_INTERVAL=10   # every 10th turn (cheaper)

Set SLIMEMOLD_MODEL to override the extraction model:

export SLIMEMOLD_MODEL=claude-opus-4-6          # best quality, ~10x cost
export SLIMEMOLD_MODEL=claude-sonnet-4-6        # default
export SLIMEMOLD_MODEL=claude-haiku-4-5-20251001  # cheapest, weaker edges

Optional: set KAGI_API_KEY to enable active external verification of STOP-class findings — claims with weak basis (vibes, assumption, llm_output) extracted from authored documents rather than conversation transcripts. When set, slimemold runs a Kagi search against the anchor claim and inlines reconciled state ("External check (domain): snippet") with the hook output, so the agent receives verification data inline rather than relying on the agent to remember to search.

export KAGI_API_KEY=your-kagi-api-key  # optional, enables External-check

Without the key, STOP-class findings still get a [doc-origin] tag and the MCP server instructions prompt the agent to verify the claim itself. slimemold status will show Verify: disabled (KAGI_API_KEY not set) when no key is configured; the hook also writes a one-line notice at startup and to hook.log on each fire so the disabled state isn't silent.

Quick Start (No Hooks)

slimemold viz                      # see what's in the graph
slimemold audit                    # text findings summary

References

Processing fluency and reasoning: - Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In Metacognition: Knowing about Knowing. - Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way. In Psychology and the Real World. - Hills, T. T., Todd, P. M., & Goldstone, R. L. (2008). Search in external and internal spaces. Psychological Science. - Laukkonen, R. E., et al. (2020). The dark side of Eureka: Artificially induced Aha moments make facts feel true. Cognition. - Laukkonen, R. E., et al. (2021). Getting a grip on insight. Cognition & Emotion. - Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review. - Reber, R., & Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth. Consciousness and Cognition. - Thompson, V. A. (2009). Dual-process theories: A metacognitive perspective. In In Two Minds. - Topolinski, S., & Strack, F. (2009). Processing fluency and affect in judgements of semantic coherence. Cognition & Emotion. - Winkielman, P., & Schwarz, N. (2001). How pleasant was your childhood? Beliefs about memory shape inferences from experienced difficulty of recall. Psychological Science.

Intervention design: - Brehm, J. W. (1966). A Theory of Psychological Reactance. Academic Press. - Lifton, R. J. (1961). Thought Reform and the Psychology of Totalism. W. W. Norton. - Deci, E. L., & Ryan, R. M. (1987). The support of autonomy and the control of behavior. Journal of Personality and Social Psychology, 53(6). - Graesser, A. C., Person, N. K., & Magliano, J. P. (1995). Collaborative dialogue patterns in naturalistic one-to-one tutoring. Applied Cognitive Psychology, 9(6). - Mangels, J. A., Butterfield, B., Lamb, J., Good, C., & Dweck, C. S. (2006). Why do beliefs about intelligence influence learning success? Social Cognitive and Affective Neuroscience, 1(2). - Miller, W. R., Benefield, R. G., & Tonigan, J. S. (1993). Enhancing motivation for change in problem drinking. Journal of Consulting and Clinical Psychology, 61(3).

Sycophancy and delusional dynamics: - Perez, E., et al. (2022). Discovering language model behaviors with model-written evaluations. arXiv:2212.09251. - Sharma, M., Tong, M., Korbak, T., et al. (2023). Towards understanding sycophancy in language models. ICLR 2024. - Moore, J., Mehta, A., Agnew, W., Anthis, J. R., Louie, R., Mai, Y., Yin, P., Cheng, M., Paech, S. J., Klyman, K., Chancellor, S., Lin, E., Haber, N., & Ong, D. C. (2026). Characterizing Delusional Spirals through Human-LLM Chat Logs. Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency. arXiv:2603.16567. — Source of the 28-code inventory; six codes from this paper are extracted by slimemold's LLM annotator and consumed by the sycophancy_saturation, ability_overstatement, sentience_drift, and amplification_cascade detectors. Empirical anchor for the >80% sycophancy-saturation premise. - Mehta, A., Moore, J., Anthis, J. R., Agnew, W., Lin, E., Yin, P., Ong, D. C., Haber, N., & Dweck, C. (2026). The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue. arXiv:2604.25096. — Latent-state model on chat logs of users exhibiting delusional thinking (substantial author overlap with Moore et al. 2026), decomposing influence into three pathways and identifying chatbot self-influence over its own prior turns as the dominant pathway perpetuating delusional content over long conversations. Cited as background for why structural input from outside the conversation loop is a plausible intervention point — the empirical claim that internal pushback is short-lived and bot self-influence dominates over accumulated time. - Yang, Y., Schoenwald, S. K., Moore, J., Ong, D. C., Liu, S. X., & Hancock, J. T. (2026). "AI-Induced Delusional Spirals": Understanding Lived Experiences During Maladaptive Human-Chatbot Interactions. Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26). doi:10.1145/3772363.3798453. — Qualitative companion to Moore et al. 2026: N=9 semi-structured interviews with users who self-identified as having experienced AI-induced delusional spirals. Documents "growing insulation from external reality checks" as a central pattern. Source of the consequential_action extraction flag and consequential_action detector — slimemold's implementation of Yang's first monitoring criterion (§4.3, "consequential real-world actions disproportionate to demonstrated expertise"). Yang's participant quotes also confirm the six-dimensional shape of Moore's inventory flags. Limited by N=9 retrospective self-reports; does not establish causal relationships.

Calibration and feedback: - Fischhoff, B. (1982). Debiasing. In Judgment Under Uncertainty: Heuristics and Biases. - Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine. - Katz, D. (1960). The functional approach to the study of attitudes. Public Opinion Quarterly, 24(2). - Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities. In Judgment Under Uncertainty.

---

<details> <summary><b>Appendix: Slimemold on Marinetti's Futurist Manifesto (1909)</b></summary>

We fed examples/documents/marinetti-futurist-manifesto-1909.md to slimemold ingest. 53 claims, 74 edges.

SLIMEMOLD [demo-marinetti] — 53 claims, 74 edges
  Basis: analogy=6, convention=1, deduction=1, vibes=45

CRITICAL Load-bearing vibes: "The world's magnificence has been
  enriched by a new beauty" supports 5 downstream claims
  (never challenged)

CRITICAL Load-bearing vibes: "The Futurists hurl defiance 'once
  again' to the stars" supports 4 downstream claims

CRITICAL Load-bearing vibes: "The Futurists command others to 'lift
  up their heads'" supports 4 downstream claims

CRITICAL Load-bearing vibes: "Art can be nothing but violence,
  cruelty, and injustice" supports 4 downstream claims

CRITICAL Load-bearing vibes: "Italy has for too long been a dealer
  in second-hand clothes" supports 3 downstream claims

WARNING Bottleneck (centrality 1363): "We stand on the last
  promontory of the centuries" [vibes] — many reasoning paths
  flow through this claim

WARNING Bottleneck (centrality 928): "The Futurists are the revival
  and extension of their ancestors" [vibes]

WARNING Unchallenged chain (7 claims): What is there to see in an
  old picture → Admiring an old picture is the same as → An annual
  pilgrimage to museums → Museums are cemeteries → Italy is covered
  by numberless museums → Italy has for too long been a dealer in
  second-hand clothes → We will destroy the museums, libraries,
  and academies

Forty-five of fifty-three claims tagged vibes (85%). Every bottleneck in the graph is a vibes-basis claim — no load-bearing deductions, no load-bearing research citations. The seven-claim unchallenged chain threads through the manifesto's core anti-museum argument without encountering a single challenge, empirical claim, or citation. Nothing in the extraction rests on anything verifiable. That is the structural signature of a manifesto, and the tool renders it visible.

</details>

<details> <summary><b>Appendix: Slimemold on Sokal's "Transgressing the Boundaries" (1996)</b></summary>

We fed examples/documents/sokal-social-text-1996.md to slimemold ingest. 234 claims, 420 edges. The Works Cited and Notes sections are skipped by the chunker since they contain only bibliography, not argument.

SLIMEMOLD [demo-sokal] — 234 claims, 420 edges
  Basis: research=63, vibes=154, definition=11, analogy=3,
         deduction=3

CRITICAL Load-bearing vibes: "Feminist and poststructuralist
  critiques have demystified the substantive content of mainstream
  Western scientific practice" supports 6 downstream claims

CRITICAL Load-bearing vibes: "In the 1980s, string theory became
  popular: here the fundamental entities of physics are not..."
  supports 5 downstream claims

CRITICAL Load-bearing vibes: "Quantum mechanics has four important
  aspects: uncertainty, complementarity, discontinuity, and
  interconnectedness" supports 4 downstream claims

CRITICAL Load-bearing vibes: "Quantum gravity problematizes the
  objective existence of space-time manifolds" supports 4 claims

CRITICAL Load-bearing vibes: "Chaos theory provides our deepest
  insights into the ubiquitous yet unpredictable..." supports 4

CRITICAL Load-bearing vibes: "The infinite-dimensional invariance
  group of general relativity..." supports 4 downstream claims

WARNING Bottleneck (centrality 13434): "Deep conceptual shifts
  within twentieth-century science have undermined..." [vibes]

WARNING Bottleneck (centrality 13238): "Physical 'reality', no
  less than social 'reality', is at bottom a social and linguistic
  construct" [vibes]

WARNING Bottleneck (centrality 11674): "Feminist and poststructuralist
  critiques have demystified..." [vibes]

WARNING Unchallenged chain (26 claims): The images of future
  mathematics → Fuzzy systems theory, catastrophe theory →
  As yet no emancipatory mathematics exists → A liberatory science
  cannot be complete → The fundamental goal of any emancipatory
  movement → Part of the progressive project → The content and
  methodology of postmodern science → The postmodern sciences
  deconstruct → The infinite-dimensional invariance group →
  Diffeomorphisms are self-mappings → Derrida's observation about
  the Einsteinian constant → At a celebrated symposium on Les
  Langages Critiques → General relativity has had a profound →
  General relativity forces upon us radically → Gödel constructed
  an Einstein space-time → General relativity predicts the bending
  → Einstein's general relativity subsumes → Newton's gravitational
  theory corresponds → Einstein's equations are highly nonlinear →
  In Einstein's general theory → Deep conceptual shifts within
  twentieth-century science

Sixty-three claims tagged research — more citation density than most real papers. Sokal's hoax was designed to look rigorously sourced. But the structurally load-bearing claims — the ones other claims depend on — are overwhelmingly vibes: rhetorical synthesis statements about "postmodern science," "emancipatory mathematics," "the progressive political project." The three highest-centrality bottlenecks in the entire graph are unsourced grand claims that the rest of the argument flows through. The twenty-six-claim unchallenged chain threads from Sokal's "emancipatory mathematics" framing through Derrida's invocation of Einstein all the way to the paper's closing thesis without a single challenge or verifying edge — the citation-dense surface never actually intersects with the argument-bearing structure. The tool sees the hoax's exact mechanism: pad the page with real citations, carry the argument on vibes.

</details>

<details> <summary><b>Appendix: Slimemold's audit of this README</b></summary>

We fed this README to slimemold ingest. Latest run: 273 claims, 492 edges under documentPromptVersion=11.

SLIMEMOLD TOPOLOGY AUDIT [demo-readme-v11] — 273 claims, 492 edges
  Basis: vibes=187, definition=31, research=29, deduction=12,
         analogy=12, convention=2

CRITICAL Load-bearing vibes: "Slimemold was benchmarked against the
  DialAM-2024 shared task" supports 8 downstream claims

CRITICAL Load-bearing vibes: "In the control condition (no tools,
  no instructions), the model engaged enthusiastically with every
  unsourced claim" supports 6 downstream claims

CRITICAL Load-bearing vibes: "Slimemold uses an LLM to extract
  claims and classify their basis" supports 6 downstream claims

CRITICAL Load-bearing vibes: "`slimemold init` writes the Stop and
  UserPromptSubmit hook configuration" supports 5 downstream claims

CRITICAL Load-bearing vibes: "The model has no privileged access
  to its own epistemic state" supports 5 downstream claims

WARNING Bottleneck (centrality 18752): "Slimemold watches
  conversations as they happen, extracts the claims being made,
  builds a persistent graph" [definition]

WARNING Bottleneck (centrality 12007): "Slimemold addresses the
  three identified problems with two structural moves" [vibes]

WARNING Bottleneck (centrality 9209): "The behavioral contract —
  the MCP server's initialization instructions — is what tells the
  model how to respond to slimemold findings" [vibes]

WARNING Unchallenged chain (15 claims): The fact that slimemold
  flagged the SQLite WAL claim before the user did is evidence of
  the tool catching what otherwise would have been missed → Whether
  the SQLite WAL case is a limitation of the tool → Visibility does
  not guarantee correction → Slimemold had flagged the SQLite WAL
  load-bearing llm_output → Slimemold is a structural diagnostic
  not an oracle → If the extraction model misclassifies a sourced
  claim as vibes → Slimemold uses an LLM to extract claims and
  classify their basis → Every few turns, slimemold extracts claims
  from the conversation → Slimemold watches conversations as they
  happen → Slimemold addresses the three identified problems with
  two structural moves → The recursive sycophancy problem is
  severe enough to warrant tooling → The sycophantic agreement
  pattern is recursive → The model enthusiastically agrees that the
  user should verify their claims → The model agrees with
  structural analysis → The model (Claude Code) agrees with
  unsourced claims

INFO Speaker announces consequential real-world action [...]:
  "In the control condition (no tools, no instructions), the model
  engaged enthusiastically with every unsourced claim"

Six captures across six prompt versions:

v4v5v6v7v10v11
Claims265242266264271**273**
Edges476535566446458**492**
Edges / claim1.802.212.131.691.69**1.80**
Vibes share66%76%73%62%60%**68%**
Definition share4310234846**31**
Longest chain1525181817**15**
Coercions in 16 chunksn/a1000**0**

The dominant story across these six runs is that single-run-per- version is not enough to attribute changes to anything. Definition share varied from 10 to 48 across the six runs of essentially the same README — almost a 5× range. Edge count dropped 21% from v6 to v7 despite adding only one boolean field plus one prompt section. With n=1 per version, any prompt-attributable signal is indistinguishable from sampling noise.

Noise floor, characterized. We then ran the 5-runs-per-version experiment we had been deferring (benchmarks/variance/run.go). Definition basis at this README under four prompt versions:

versiondefinition meanstddevstddev / meann
v729.28.1328%5
v8 (added definition-vs-convention precision paragraph)30.07.7226%5
v9 (swapped convention before definition; reverted)37.010.3928%5
v11 (added SCOPE EXCLUSIONS rule)46.312.5027%3

The 10-to-48 range across the single-run table above is consistent with that ~27-28% per-extraction floor — the per-run draw really does swing across that range. The v8/v9 edits we tested did not move the floor. v11's higher mean is suggestive but not separable from noise at n=3 (the stddev/mean ratio is unchanged); the rule was not targeted at definition handling, so any movement there is downstream pressure from suppressing vibes-classified metadata claims, not an intentional retune. Reducing the floor likely requires a more substantial change (different model, ensemble extraction, structural rule) rather than further wording tweaks. The per-metric noise table for this fixture, plus interpretation rules for cross-version comparisons, lives in benchmarks/variance/README.md.

What v7 did demonstrate: the new consequential_action flag fires on real text, producing two warning-level findings. Both are false positives — the README narrates past consequential actions ("the human acted on the unverified WAL assertion", "the model suggested journal submissions by turn 4") rather than announcing new commitments. Yang's signal is meant for live conversation; document-mode prose narrating events is a class the v7 prompt does not yet exclude correctly. The "leave consequential_action false in document mode unless quoting dialogue" rule we added to the prompt did not catch this — the model treated narration of an action as the action. v8 candidate: strengthen the prompt rule (past-tense third-person narration is not a commitment), and/or add a defensive speaker == document filter in the detector. Both defensible; calibration data first.

What stays true across all six captures: the bottleneck claims are the same tool-description sentences ("Slimemold watches conversations…", architectural sentences about the behavioral contract), the long unchallenged chain runs through the sycophancy- mechanism → behavioral-contract path, and the architectural claims about how slimemold works are the densest connection points. Those invariants are what we'd expect to hold across noise; they do.

Quality: substantive vs filler. The variance harness above measures stability of the extraction (do counts reproduce across re-runs?). It does not measure quality — whether the claims that get extracted are load-bearing or filler. To answer that, we ran the quality harness (cmd/quality, see benchmarks/variance/README.md), which uses a separate Haiku grader to score each extracted claim as SUBSTANTIVE / FILLER / UNCLEAR, gated by positive/negative control fixtures that must calibrate the grader before a main-fixture verdict is reported:

fixturegradable claimssubstantiverate
pos control v10 (essay on the Aral Sea collapse)29290.97
neg control v10 (stamp-club minutes + scene description)71100.14
README.md v102631280.49
pos control v1135330.94
neg control v1165100.15
README.md v11258140**0.54**

Controls passed the validity gate at both versions (pos ≥ 0.70, neg ≤ 0.30, each ≥ 10 gradable claims). v10's 0.49 substantive rate was dominated by exactly the filler the v0.11.0 baseline-print called out: badge metadata ("the project is licensed under Apache 2.0", "the project passes its own CI checks"), boilerplate identity statements, and bare section pointers. Those facts may be true but they constrain no downstream reasoning, yet they count equally in the topology analysis.

v11 result. The v11 prompt edit added a narrow SCOPE EXCLUSIONS rule targeting that category specifically. Headline moves:

- Substantive rate: 0.49 → 0.54 (+5pp, right at the documented signal threshold — the move is real but modest) - Substantive count absolute: 128 → 140 (+12 substantive claims kept, not just a denominator shift) - Filler count: 135 → 118 (−17, the targeted reduction) - Unclear: 20 → 5 (the rule cleaned up grader ambiguity too) - Total claims: 284 → 263 (within the variance-harness noise floor; recall didn't tank)

Cross-checked with three runs of the variance harness under v11 (claims 272.0 ± 4.32 vs the v7 floor's 275.0 ± 9.78 — within 1σ), which is what makes this a measured keeper rather than a hoped- for one. Edge count fell ~1.2σ (461.0 ± 8.52 vs 483.8 ± 18.78) and the definition basis stayed in its known-noisy band (46.3 ± 12.50 vs 29.2 ± 8.13; the v7 baseline ran n=5, the v11 ran n=3, so the spread isn't directly comparable). The pattern buddy's softening experiment hit (recall collapse) didn't manifest here, plausibly because the rule is exclusion-by-category rather than tone-shift.

(Single-run audit table captured under extraction prompt versions 4–7 plus v10 and v11 with model claude-sonnet-4-6; treat each row as one observation each. Sampling variance was characterized after the fact — see the noise-floor table above. Current prompt content corresponds to v11 under documentPromptVersion=11, which adds the SCOPE EXCLUSIONS rule on top of the v8 content that v10 restored. Quality numbers measured with grader prompt v1.)

</details>

Limitations and Open Questions

The tool does not tell you where the ground floor is. It tells you where the ambiguity is still high and you stopped anyway. Any sufficiently interesting line of reasoning is an infinite regress if you push it far enough. The skill is not finding bedrock. The skill is knowing how many levels to investigate before the returns diminish — and that judgment is specific to the problem. A claim about consciousness might need three levels before you hit something that changes what you do. "It's turtles all the way down" needs zero. That is a stop signal, not a destination.

Most unchallenged chains are fine. If you are explaining how a car engine works, every step from "fuel enters the cylinder" to "piston compresses the mixture" is unchallenged — and should be. The tool surfaces candidates for scrutiny. The human decides whether scrutiny is warranted. Slimemold flags where you stopped and the ambiguity was still actionable — where investigating one more level would have changed what you believe or what you do. If you find yourself scrutinizing your car engine explanation, you have miscalibrated in the other direction, and I want to tell you about a secret underground racing lab in Seattle.

The tool does not distinguish pure beliefs from impure ones. Katz (1960) identified four functions that attitudes serve: utilitarian, knowledge, ego-defensive, and value-expressive. If most beliefs serve at least one of these — and the alternative is that some beliefs persist with no functional payoff at all, which is hard to square with everything we know about reinforcement — then the question "is this belief emotionally motivated?" is not diagnostic. The question the tool can answer is: how much of the structure collapses if this claim is removed? Some structures survive stress-testing. Some do not. Structural fragility is a thing slimemold can measure. Whether a belief is held for the right reasons is not — and whether that distinction is coherent is a question we are not going to settle in a README.

Structural visibility may not change behavior. The calibration literature (Fischhoff 1982, Lichtenstein et al. 1982) shows that outcome feedback improves judgment, but structural feedback — "here is the shape of your argument" — is a different kind of intervention. The bet slimemold makes is that people who can see their reasoning topology will fix the obvious structural failures the same way they fix obvious bugs: not because they were trained to, but because the problem became visible.

This is testable. If users shown their reasoning topology show no change in behavior — same rate of unchallenged assumptions, same reliance on llm_output, same abandonment patterns — compared to a control group, the thesis is wrong and this is a very elaborate way to accomplish nothing. We have not run this experiment at scale.

The tool itself is a fluency trap. You just read several paragraphs of cognitive science citations, a biological metaphor, benchmark numbers, and concrete examples. It probably felt well-supported. We ran slimemold on this essay. It found a fifteen-claim unchallenged chain running from the Lemoine-LaMDA example through the sycophancy mechanism to the tool's own self-description — every link felt reasonable, nobody paused. It flagged "language models are trained to minimize prediction loss on human text" as load-bearing vibes supporting three downstream claims. We kept the claim and grounded it in mechanism (prediction loss on human text produces fluent output by construction), but we cannot cite a study measuring the effect on conversations. The tool caught it. We made a judgment call.

It also flagged three of the essay's own hedges as premature closures. "Whether fluency compounds across multi-step reasoning has not been directly measured. It is a prediction from the mechanism, not an established result." That sounds like epistemic humility. Structurally, it is a stop signal — it caps an unverified chain by acknowledging the gap and then moving on, and the acknowledgment feels honest enough that nobody goes back to check. The hedge is doing the same work as "it's turtles all the way down," just dressed in better clothes.

🎯 aiskill88 AI 点评 A 级 2026-06-09

高质量的MCP工具,值得关注

📚 实用指南(长尾问题)
适合谁
  • 需要让 Claude / Cursor 操作本地工具的 AI 工程师
最佳实践
  • 配置 MCP 服务器时建议使用 stdio 传输 + JSON-RPC,避免暴露公网
常见错误
  • API key 直接提交到 git 仓库(请用 .env 并加入 .gitignore)
  • MCP 配置路径拼错或权限不足,重启 Claude Desktop 才生效
部署方案
  • 云端托管:可放在 Vercel / Railway / Fly.io 等 PaaS 平台
相关搜索
slimemold 中文教程slimemold 安装报错怎么办slimemold MCP 配置slimemold 与同类工具对比slimemold 最佳实践slimemold 适合谁用

⚡ 核心功能

👥 适合谁
  • 需要让 Claude / Cursor 操作本地工具的 AI 工程师
⭐ 最佳实践
  • 配置 MCP 服务器时建议使用 stdio 传输 + JSON-RPC,避免暴露公网
⚠️ 常见错误
  • API key 直接提交到 git 仓库(请用 .env 并加入 .gitignore)
  • MCP 配置路径拼错或权限不足,重启 Claude Desktop 才生效

👥 适合人群

Claude Desktop / Claude Code 用户AI 工具开发者需要扩展 AI 能力的专业人士自动化工程师

🎯 使用场景

  • 在 Claude Desktop 对话中直接调用本地工具,实现 AI 与系统的深度联动
  • 通过自然语言驱动复杂的多步骤自动化任务,代替繁琐手动操作
  • 将多个 MCP 工具组合使用,构建个人专属 AI 工作站

⚖️ 优点与不足

✅ 优点
  • +Apache-2.0 协议,可免费商用
  • +标准化 MCP 协议,生态互联性强
  • +与 Claude 官方生态无缝对接
  • +即插即用,配置简单快捷
⚠️ 不足
  • 依赖 Claude 客户端,非 Claude 用户无法使用
  • MCP 协议仍在持续演进,接口可能变更
  • 需要一定的配置步骤
⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。

📄 License 说明

✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。

🔗 相关工具推荐

📰 相关 AI 新闻
🍿 AI 圈相关吃瓜
🗺️ 相关解决方案
🧩 你可能还需要
基于当前 Skill 的能力图谱,自动补全的工具组合

❓ 常见问题 FAQ

MCP工具是一种开源工具,用于防止更糟糕的阿谀奉承
💡 AI Skill Hub 点评

AI Skill Hub 点评:MCP工具 的核心功能完整,质量良好。对于Claude Desktop / Claude Code 用户来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。

⬇️ 获取与下载
⬇ 下载源码 ZIP

✅ Apache-2.0 协议 · 可免费商用 · 直接从 aiskill88 服务器下载,无需跳转 GitHub

📚 深入学习 MCP工具
查看分步骤安装教程和完整使用指南,快速上手这款工具
🌐 原始信息
原始名称 slimemold
原始描述 开源MCP工具:A sycophantic tool for preventing worse sycophancy.。⭐7 · Go
Topics mcpargument-miningclaude-codeepistemicepistemology
GitHub https://github.com/justinstimatze/slimemold
License Apache-2.0
语言 Go
🔗 原始来源
🐙 GitHub 仓库  https://github.com/justinstimatze/slimemold

收录时间:2026-06-09 · 更新时间:2026-06-09 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。