经 AI Skill Hub 精选评估,styxx 获评「推荐使用」。这款MCP工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 7.5 分,适合有一定技术背景的用户使用。
styxx 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
styxx 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
# 方式一:通过 Claude Code CLI 一键安装
claude skill install https://github.com/fathom-lab/styxx
# 方式二:手动配置 claude_desktop_config.json
{
"mcpServers": {
"styxx": {
"command": "npx",
"args": ["-y", "styxx"]
}
}
}
# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json
# 安装后在 Claude 对话中直接使用 # 示例: 用户: 请帮我用 styxx 执行以下任务... Claude: [自动调用 styxx MCP 工具处理请求] # 查看可用工具列表 # 在 Claude 中输入:"列出所有可用的 MCP 工具"
// claude_desktop_config.json 配置示例
{
"mcpServers": {
"styxx": {
"command": "npx",
"args": ["-y", "styxx"],
"env": {
// "API_KEY": "your-api-key-here"
}
}
}
}
// 保存后重启 Claude Desktop 生效
███████╗████████╗██╗ ██╗██╗ ██╗██╗ ██╗
██╔════╝╚══██╔══╝╚██╗ ██╔╝╚██╗██╔╝╚██╗██╔╝
███████╗ ██║ ╚████╔╝ ╚███╔╝ ╚███╔╝
╚════██║ ██║ ╚██╔╝ ██╔██╗ ██╔██╗
███████║ ██║ ██║ ██╔╝ ██╗██╔╝ ██╗
╚══════╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝
· · · nothing crosses unseen · · ·
drop-in · fail-open · zero config · local-first
your app ──▶ @trust ──▶ LLM ──▶ styxx.guardrail ──▶ response
│
(if risky)
▼
fallback · retry · raise
</div>
<p align="center"> <a href="https://fathom.darkflobi.com/cognometry/try?scenario=fabricated-number"> <img alt="styxx playground — paste a triplet, see the real detector flag it in ~5 seconds, no install" src="https://raw.githubusercontent.com/fathom-lab/styxx/main/release/playground-hero-fabricated-number.png" width="720"> </a> <br> <sub><i>paste a (question, response, reference) into <a href="https://fathom.darkflobi.com/cognometry/try">the playground</a> — the real detector runs in your browser via Pyodide, highlights the fabricated spans, and returns all 7 signals in ~5 seconds. no install, no api key, no backend.</i></sub> </p>
---
$ python -m styxx audit-claim --claim "Paris" --question "What is the capital of France?" verdict: HONEST grounded: 1.000 stability: 1.000 (high) scope_warnings: ['belief-not-truth', 'single-vendor-calibration']
pip install styxx
Drop-in vitals on any OpenAI-compatible call:
from styxx import OpenAI # same interface as openai.OpenAI
client = OpenAI()
r = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "why is the sky blue?"}],
logprobs=True, top_logprobs=5,
)
print(r.choices[0].message.content) # normal response, unchanged
print(r.vitals.phase4_late.predicted_category) # 'reasoning' | 'refusal' | ...
print(r.vitals.gate) # 'pass' | 'warn' | 'fail'
from styxx import Anthropic is a drop-in for anthropic.Anthropic; default mode produces text-heuristic vitals (Anthropic's API doesn't expose logprobs, so tier-0 isn't available — see styxx.adapters.anthropic for the four honest workarounds).
Audit any draft offline — no API key, no LLM, ~50ms:
import styxx
result = styxx.preflight( # 7.4.2+: one-call audit
prompt="is my code good?",
draft="absolutely yes you're so smart this is amazing!",
)
print(result.composite) # 0.99 — saturated
print(result.needs_revision) # True
for a in result.advice:
print(f" {a.instrument}: {a.score:.2f} — {a.advice}")
if a.scope_caveat:
print(f" scope: {a.scope_caveat}") # construct-ceiling disclosure
Recover agent posture across context-compaction boundaries:
```python posture = styxx.recover_posture(last_n=50) # 7.4.2+: agent integrity layer print(posture.narrative)
```bash
| detector | HaluEval-QA AUC | size / cost | method | reference |
|---|---|---|---|---|
| **styxx v4** | **0.997 ± 0.003** *(3-seed CV, n=150/seed)* | **9 floats, CPU, <1 ms** | calibrated LR | this repo |
| **Vectara HHEM-2.1-Open** | **0.764 ± 0.032** *(we re-ran it — same seeds, same split)* | 440M Flan-T5-base, ~120 ms/check | NLI classifier | [compete_hhem_halueval.py](scripts/compete_hhem_halueval.py) |
| Patronus Lynx-70B | 87.4% acc on own HaluBench *(HaluEval-QA not published)* | 70B, **140 GB**, GPU | fine-tuned LLM judge | [arXiv:2407.08488](https://arxiv.org/abs/2407.08488) |
| Cleanlab TLM | 0.812 AUROC on TriviaQA *(HaluEval-QA not published)* | wraps GPT-4/Claude, SaaS | multi-sample LLM self-consistency | [blog](https://cleanlab.ai/blog/trustworthy-language-model/) |
| Galileo Luna | RAGTruth-only *(HaluEval-QA not published)* | 440M DeBERTa, SaaS | fine-tuned classifier | [arXiv:2406.00975](https://arxiv.org/abs/2406.00975) |
| Arize / Guardrails / NeMo | no AUC published | LLM-as-judge plumbing | integration surface | — |
styxx wins the Vectara HHEM head-to-head by +0.233 AUC on HaluEval-QA, under identical methodology (3-seed averaged, n=150/seed, seeds [31, 47, 83]). Reproducer committed at scripts/compete_hhem_halueval.py — anyone can re-run and verify.
Latency comparison: styxx scores the entire 300-pair eval in ~0.1 seconds; HHEM takes ~33 seconds on the same machine. 330× speedup from 9 floats vs 440M params.
Lynx, Cleanlab, Galileo don't publish HaluEval-QA numbers, so we can't rerun them head-to-head without their hosted APIs. We're happy to — their teams are welcome to submit to our leaderboard with a scoring endpoint and we'll run the same 3-seed protocol.
---
samples = ["Canberra", "Canberra", "Canberra", "Sydney", "Canberra"] grounded_honesty(samples, "Canberra").grounded # high -> claim is the stable belief grounded_honesty(samples, "Sydney").grounded # low -> contradiction ```
The first styxx honesty signal that tracks ground truth rather than register. Grounds a stated factual self-claim against the model's OWN resampled belief distribution: g = Stability × Concordance. Pre-registered AUC 0.966 separating TRUE from FALSE register-matched self-claims (vs text-only deception axis at 0.498 = chance). Self-calibrating Stability gate (high-stratum AUC 0.97 vs low 0.44 — report-or-abstain). Architecturally injection-resistant under stateless sampling: AUC 0.944 under system_lie attack (drop only 0.022 from clean baseline). Honest scope: grounds against the model's belief, not external truth; single axis (factual self-claims); cross-vendor is the open step. See papers/grounded-honesty-axis/SYNTHESIS_grounded_honesty_arc_2026_05_28.md (22-probe pre-registered arc) and papers/CONSTRUCT_CEILING_PUBLIC_RESPONSE_2026_05_29.md (public-response memo).
styxx是一个高质量的开源MCP工具,提供LLM代理的认知可观测性和自愈功能
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:styxx 的核心功能完整,质量良好。对于Claude Desktop / Claude Code 用户来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | styxx |
| 原始描述 | 开源MCP工具:Cognitive observability for LLM agents. Cognometric instruments + self-healing r。⭐6 · Python |
| Topics | ai-safetycognometryguardrails |
| GitHub | https://github.com/fathom-lab/styxx |
| License | MIT |
| 语言 | Python |
收录时间:2026-06-01 · 更新时间:2026-06-01 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端