本地语音模式 是 AI Skill Hub 本期精选AI工具之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
本地语音模式 是一款基于 Shell 开发的开源工具,专注于 AI、语音、离线 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
本地语音模式 是一款基于 Shell 开发的开源工具,专注于 AI、语音、离线 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 克隆仓库 git clone https://github.com/groxaxo/Local-VoiceMode-LLM cd Local-VoiceMode-LLM # 查看安装说明 cat README.md # 按 README 完成环境依赖安装后即可使用
# 查看帮助 local-voicemode-llm --help # 基本运行 local-voicemode-llm [options] <input> # 详细使用说明请查阅文档 # https://github.com/groxaxo/Local-VoiceMode-LLM
# local-voicemode-llm 配置说明 # 查看配置选项 local-voicemode-llm --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export LOCAL_VOICEMODE_LLM_CONFIG="/path/to/config.yml"
<p align="center"> <img src="img/banner.png" alt="Local VoiceMode LLM — talk to your AI, on CPU" width="100%"> </p>
<p align="center"> <strong>Give your AI agent a voice — and ears — that run entirely on your CPU.</strong> </p>
<p align="center"> <a href="#quick-start">Quick Start</a> · <a href="#benchmarks">Benchmarks</a> · <a href="#agent-integrations">Integrations</a> · <a href="#configuration">Config</a> </p>
-----
A complete, local voice pipeline for AI agents. One command installs everything: Silero VAD for detecting when you speak, Parakeet TDT 0.6B for transcription, and Supertonic TTS 3 for synthesis. No cloud, no API keys, no GPU required.
It drops a talk skill into Claude Code, OpenCode CLI, OpenClaw, Hermes Agent, and Codex, then installs and starts the speech backends for you. Pick your agent, run the installer, start talking.
TALK_AUTO_LISTEN=1)TALK_BARGE_IN=1):7862, no npm, no build stepsetup.sh is safe| Component | Location | Port | Auto-start |
|---|---|---|---|
| Voice venv (VAD + ONNX) | ~/.config/opencode/tts-venv/ | — | — |
| **Parakeet STT** | ~/.config/opencode/parakeet-stt/ | **5093** | launchd / systemd / Task Scheduler |
| **Supertonic TTS** | ~/.config/opencode/supertonic-tts/ | **8766** | launchd / systemd / Task Scheduler |
| Supertonic 2 *(opt-in)* | ~/.config/opencode/supertonic2-tts/ | **8880** | integrations/supertonic2/install.sh |
| **Web dashboard** | frontend/ (repo) | **7862** | manual (bash frontend/start.sh) |
talk skill | per-agent (see below) | — | — |
Optional: Supertonic 2. Supertonic Express 2 (modelonnx-community/Supertonic-TTS-2-ONNX) is a 66M-param, CPU-only, multilingual ONNX TTS with the same OpenAI-compatible API. Add it withbash integrations/supertonic2/install.sh, then select it withTTS_ENGINE=supertonic2— it runs on:8880alongside Supertonic 3 and falls back to it automatically. Seeintegrations/supertonic2/.
VAD_THRESHOLD=0.65 VAD_MIN_SILENCE_MS=800 talk.sh listen ```
All values are also adjustable live in the Web Dashboard (saved to frontend-config.json).
Supertonic 2 is an optional backend (bash integrations/supertonic2/install.sh, then TTS_ENGINE=supertonic2). Measured back-to-back on the same i7-12700KF, both CPU-only (median of 5, voice F4), it synthesizes ~3.2× faster than Supertonic 3 at the default 8 steps:
| Reply | Audio | Supertonic 3 (8 steps) | Supertonic 2 (8 steps) | Speed-up |
|---|---|---|---|---|
| short (10 words) | 2.4 s | 1.98 s · 0.82 RTF | **0.78 s · 0.29 RTF** | 2.6× |
| medium (22 words) | 6.6 s | 3.12 s · 0.48 RTF | **0.99 s · 0.15 RTF** | 3.2× |
| long (45 words) | 13.4 s | 5.44 s · 0.41 RTF | **1.44 s · 0.10 RTF** | 3.8× |
At high quality (20 steps) the gap widens to ~3.4× (mean RTF 0.31 vs 1.07). Both engines share the same OpenAI-compatible API and voices (F1–F5 / M1–M5), so switching is just TTS_ENGINE; Supertonic 2 runs on :8880 and coexists with Supertonic 3 (:8766), falling back to it automatically. Full numbers and the reproduce script: benchmarks/TTS_BACKENDS.md · python benchmarks/compare_tts_backends.py.
export XAI_API_KEY=xai-... ```
Backends are installed and running when it finishes. That’s the whole setup.
| Variable | Default | Description |
|---|---|---|
STT_ENGINE | local | STT backend — Parakeet on :5093 (ONNX/CPU on Linux, CoreML on macOS) |
STT_URL | http://127.0.0.1:5093/v1/audio/transcriptions | Local Parakeet endpoint |
TTS_ENGINE | supertonic | supertonic (local ONNX) → neutts (local GGUF) → xai (cloud, last resort) |
SUPERTONIC_URL | http://127.0.0.1:8766 | Supertonic endpoint |
SUPERTONIC_VOICE | F4 | F1–F5 / M1–M5 |
TTS_QUALITY | normal | normal = 8 steps (fast) · high = 20 steps (best) |
SUPERTONIC_STEPS | (from quality) | Denoising steps 1–20; overrides the preset |
XAI_API_KEY | (env) | Bearer token for xAI cloud fallback |
XAI_TTS_VOICE | eve | ara · eve · leo · rex · sal |
TALK_AUTO_LISTEN | 1 | Run listen after speak |
TALK_BARGE_IN | 0 | Interrupt TTS on speech |
TALK_IDLE_TIMEOUT_S | 300 | Session-silence window — end listen after N s of no speech (0 = off) |
VAD_THRESHOLD | 0.5 | Speech sensitivity — lower = catches softer speech, higher = ignores background noise/speech (also in dashboard) |
VAD_MIN_SILENCE_MS | 700 | End-of-turn silence — 700 ms tolerates mid-sentence pauses; lower (~500) for snappier turns (also in dashboard) |
MIC_QUERY | _(empty)_ | Mic name substring; empty = auto-detect (honors the OS system-default input, skips virtual adapters) |
PORT | 7862 | Dashboard port |
The installer copies the talk skill into each selected agent’s skill directory. Same SKILL.md descriptor everywhere — it tells the agent when to invoke voice (talk, voice, speak, habla, audio, tts), how to run the VAD → STT → TTS loop, and where the services live.
| Agent | Skill path | Activation |
|---|---|---|
| **Claude Code** | ~/.claude/skills/talk/ | skill("talk") or auto-detected |
| **OpenCode CLI** | ~/.config/opencode/skills/talk/ | skill("talk") |
| **OpenClaw** | ~/.openclaw/skills/talk/ | skill("talk") |
| **Hermes Agent** | ~/.hermes/skills/talk/ | skill("talk") |
| **Codex** | ~/.codex/skills/talk/ | auto-detected via symlink |
More installer options:
./setup.sh --venv-only # only create the voice venv
./setup.sh --skip-voices # skip reference voice generation
./setup.sh --no-integrations # skip all agent integrations
./setup.sh --force # overwrite existing plists/tasks (destructive)
./setup.sh --uninstall # stop services, remove plists
./setup.sh --uninstall --force # also remove installed dirs
高质量的开源AI语音工具,值得关注
该工具未明确声明开源协议,商业使用前请联系原作者确认授权范围,避免侵权风险。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
经综合评估,本地语音模式 在AI工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | Local-VoiceMode-LLM |
| 原始描述 | 开源AI工具:Local ears and mouth for your LLM — offline, private, safe, free & open source. 。⭐29 · Shell |
| Topics | AI语音离线开源 |
| GitHub | https://github.com/groxaxo/Local-VoiceMode-LLM |
| 语言 | Shell |
收录时间:2026-06-13 · 更新时间:2026-06-13 · License:未公布 · AI Skill Hub 不对第三方内容的准确性作法律背书。