Chatterbox-TTS-Server 是什么工具？

Chatterbox-TTS-Server 是一款Python开发的AI辅助工具。Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.

Chatterbox-TTS-Server 如何安装和开始使用？

访问 Chatterbox-TTS-Server 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

Chatterbox-TTS-Server 是否免费？许可证是什么？

Chatterbox-TTS-Server 完全免费，采用 MIT 许可证开源发布，任何人都可以免费使用、修改和分发。

Chatterbox-TTS-Server 适合哪些用户使用？

Chatterbox-TTS-Server 主要面向有一定技术基础的用户，包括开发者、数据分析师、AI 工程师等专业人士。

Chatterbox-TTS-Server 的社区活跃度和项目维护状况如何？

Chatterbox-TTS-Server 在 GitHub 上已获得 1,233 个 Star，处于积极发展阶段，社区在持续扩大。

📄 工具详情 ⚙️ 安装教程 📚 使用教程

能力标签

🤖 Agent 🔄 工作流 🐳 Docker 💻 CLI 🔗 REST API 🔊 TTS ✨ GPT

🛠

AI工具

Chatterbox-TTS-Server — AI 语音合成工具中文文档

基于 Python · 开源免费，本地部署，数据完全自主可控

英文名：Chatterbox-TTS-Server

⭐ 1.2k Stars 🍴 300 Forks 💻 Python 📄 MIT 🏷 AI 8.2分

8.2AI 综合评分

文本转语音API服务开源项目本地部署CUDA加速

🌐 访问官网 📺 TG 频道

✦ AI Skill Hub 推荐

Chatterbox-TTS-Server — AI 语音合成工具中文文档是 AI Skill Hub 本期精选AI工具之一。已获得 1.2k 颗 GitHub Star，综合评分 8.2 分，整体质量较高。我们强烈推荐将其纳入你的 AI 工具库，帮助提升工作效率。

📚 深度解析

Chatterbox-TTS-Server — AI 语音合成工具中文文档是一款基于 Python 的开源工具，在 GitHub 上收获 1k+ Star，是文本转语音、API服务、开源项目、本地部署领域中的优质开源项目。开源工具的最大优势在于代码完全透明，你可以审计每一行代码的安全性，也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS？**
对于个人开发者和有隐私需求的用户，本地部署的开源工具意味着数据不离本机，不受第三方服务商的数据政策约束。同时，开源工具通常没有使用次数限制和月度费用，一次安装即可长期使用，对于高频使用场景的总拥有成本（TCO）远低于订阅制商业工具。

**安装与环境准备**
Chatterbox-TTS-Server — AI 语音合成工具中文文档依赖 Python 运行环境。建议通过 pyenv（Python）或 nvm（Node.js）管理 Python 版本，避免全局环境污染。对于新手用户，推荐先创建虚拟环境（python -m venv venv && source venv/bin/activate），再安装依赖，这样即使出现问题也可以随时删除虚拟环境重新开始，不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues（已关闭的问题），大多数常见问题都已有解答。遇到 Bug 时，提供 pip list 的输出、完整错误堆栈和最小可复现示例，能显著提高开发者响应速度。AI Skill Hub 将持续追踪 Chatterbox-TTS-Server — AI 语音合成工具中文文档的版本更新，及时通知重要功能变化。

📋 工具概览

自托管的高质量文本转语音解决方案，提供Web UI、OpenAI兼容API、预设声音库和声音克隆功能。支持大规模有声书生成，适合内容创作者、开发者和企业用户搭建本地TTS服务。

Chatterbox-TTS-Server — AI 语音合成工具中文文档是一款基于 Python 开发的开源工具，专注于文本转语音、API服务、开源项目等核心功能。作为 GitHub 开源项目，它拥有活跃的社区支持和持续的版本迭代，代码完全透明可审计，支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流，都能提供稳定可靠的解决方案。

GitHub Stars

⭐ 1.2k

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

正常维护，社区驱动

开源协议

MIT

AI 综合评分

8.2 分

工具类型

AI工具

Forks

300

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

🎯 主要使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install chatterbox-tts-server

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install chatterbox-tts-server

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/devnen/Chatterbox-TTS-Server
cd Chatterbox-TTS-Server
pip install -e .

# 验证安装
python -c "import chatterbox_tts_server; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库页面
按照 README 文档完成依赖安装
根据系统环境完成初始化配置
参考官方示例或文档开始使用
遇到问题可在 GitHub Issues 中查找解答

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
chatterbox-tts-server --help

# 基本用法
chatterbox-tts-server input_file -o output_file

# Python 代码中调用
import chatterbox_tts_server

# 示例
result = chatterbox_tts_server.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# chatterbox-tts-server 配置文件示例（config.yml）
app:
  name: "chatterbox-tts-server"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
chatterbox-tts-server --config config.yml

# 或通过环境变量配置
export CHATTERBOX_TTS_SERVER_API_KEY="your-key"
export CHATTERBOX_TTS_SERVER_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 92/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

🗣️ Overview: Enhanced Chatterbox TTS Generation

The Chatterbox TTS model by Resemble AI provides capabilities for generating high-quality speech. This project builds upon that foundation by providing a robust FastAPI server that makes Chatterbox significantly easier to use and integrate.

🚀 Want to try it instantly? Launch the live demo in Google Colab - no installation needed!

The server expects plain text input for synthesis and we solve the complexity of setting up and running the model by offering:

A modern Web UI for easy experimentation, preset loading, reference audio management, and generation parameter tuning.
Multi-engine support (Original + Turbo): Choose the TTS engine directly in the Web UI, then generate via the same UI/API surface.
Paralinguistic prompting (Turbo): Native tags like [laugh], [cough], and [chuckle] for natural non-speech reactions inside the same generated voice.
Original Chatterbox strengths: High quality English output plus unique "emotion exaggeration control" and 0.5B LLaMA backbone.
Multi-Platform Acceleration: Full support for NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with an automatic fallback to CPU, ensuring you can run on any hardware.
Large Text Handling: Intelligently splits long plain text inputs into manageable chunks based on sentence structure, processes them sequentially, and seamlessly concatenates the audio.
📚 Audiobook Generation: Perfect for creating complete audiobooks - simply paste an entire book's text and the server automatically processes it into a single, seamless audio file with consistent voice quality throughout.
Predefined Voices: Select from curated, ready-to-use synthetic voices for consistent and reliable output without cloning setup.
Voice Cloning: Generate speech using a voice similar to an uploaded reference audio file.
Consistent Generation: Achieve consistent voice output across multiple generations or text chunks by using the "Predefined Voices" or "Voice Cloning" modes, optionally combined with a fixed integer Seed.
Docker support for easy, reproducible containerized deployment on any platform.

This server is your gateway to leveraging Chatterbox's TTS capabilities seamlessly, with enhanced stability, voice consistency, and large text support for plain text inputs.

🆕 What's New

🚀 v2.0.0 highlights (new)

v2.0 ships the complete Chatterbox family on every major GPU stack behind one OpenAI-compatible API and Web UI. The headline themes:

DGX Spark / sm_121 support via the new docker-compose-cu130.yml (CUDA 13.0, PyTorch 2.10). RTX 30/40/50 keep using cu121 / cu128.
AMD Strix Halo support via docker-compose-strixhalo.yml (ROCm 7.2, HSA_OVERRIDE_GFX_VERSION=11.0.0).
Streaming /tts endpoint — opt-in stream: true parameter returns a StreamingResponse that flushes WAV bytes per chunk with 20 ms crossfades. Default behavior unchanged.
Voice conditioning cache — repeated requests against the same reference voice skip re-encoding. Real latency win for batch / OpenAI-endpoint workflows.
Opt-in BF16 inference — TTS_BF16=on (or =auto) converts T3 to bfloat16 and runs under autocast for ~40% throughput on bf16-capable GPUs. Default off to preserve existing behavior on upgrade.
HTTPS / SSL — optional ssl_certfile and ssl_keyfile in config.yaml for direct HTTPS without a reverse proxy.
Security: CWE-22 path traversal fixed on /tts and /v1/audio/speech voice file parameters. Traversal attempts return HTTP 400.
New endpoints — /api/unload (release GPU memory without restart) and /v1/audio/voices (OpenAI-compatible voice listing).
Dynamic language selector — the UI populates the language dropdown from SUPPORTED_LANGUAGES exposed by the multilingual engine.
Chunker fix — stray dashes in narrative text no longer get treated as bullet items that swallow the rest of the paragraph (#144).

See the v2.0.0 release notes for the full list with contributor credits.

✨ Key Features of This Server

🔥 Live Demo Available: * 🚀 One-Click Google Colab Demo: Try the full server with voice cloning and audiobook generation instantly in your browser - no local installation required!

This server application enhances the underlying chatterbox-tts engine with the following:

🚀 Core Functionality:

Multi-Engine Support:
Choose between Original Chatterbox, Chatterbox Multilingual, and Chatterbox‑Turbo via a hot-swappable engine selector in the Web UI.
Original Chatterbox provides high-quality English output with emotion exaggeration control (0.5B parameters).
Chatterbox Multilingual offers 23-language support with voice cloning and emotion control (0.5B parameters).
Chatterbox Turbo delivers significantly faster inference with a streamlined 350M-parameter architecture and paralinguistic tags.
All three models are hot-swappable—simply select from the dropdown without restarts or config changes.
Paralinguistic Tags (Turbo):
Write native tags like [laugh], [cough], and [chuckle] directly in your text when using Chatterbox‑Turbo.
New presets demonstrate paralinguistic prompting for agent-style scripts and expressive narration.
Large Text Processing (Chunking):
Automatically handles long plain text inputs by intelligently splitting them into smaller chunks based on sentence boundaries.
Processes each chunk individually and seamlessly concatenates the resulting audio, overcoming potential generation limits of the TTS engine.
Ideal for audiobook generation - paste entire books and get professional-quality audiobooks with consistent narration.
Configurable via UI toggle ("Split text into chunks") and chunk size slider.
Predefined Voices:
Allows usage of curated, ready-to-use synthetic voices stored in the ./voices directory.
Selectable via UI dropdown ("Predefined Voices" mode).
Provides reliable voice output without manual cloning setup.
Voice Cloning:
Supports voice cloning using a reference audio file (.wav or .mp3).
The server processes the reference audio for the engine.
Generation Seed: Added seed parameter to UI and API for influencing generation results. Using a fixed integer seed in combination with Predefined Voices or Voice Cloning helps maintain consistency.
API Endpoint (/tts):
The primary API endpoint, offering fine-grained control over TTS generation.
Supports parameters for text, voice mode (predefined/clone), reference/predefined voice selection, chunking control (split_text, chunk_size), generation settings (temperature, exaggeration, CFG weight, seed, speed factor, language), and output format.
UI Configuration Management: Added UI section to view/edit config.yaml settings (server, model, paths) and save generation defaults.
Configuration System: Uses config.yaml for all runtime configuration, managed via config.py (YamlConfigManager). If config.yaml is missing, it's created with default values from config.py.
Audio Post-Processing (Optional): Includes utilities for silence trimming, internal silence reduction, and (if parselmouth is installed) unvoiced segment removal to improve audio quality. These are configurable.
UI State Persistence: Web UI now saves/restores text input, voice mode selection, file selections, and generation parameters (seed, chunking, sliders) in config.yaml (ui_state section).

🔧 General Enhancements:

Easy Installation & Management:
🚀 Automated Launcher (start.bat / start.sh) - One-command setup with automatic hardware detection
🔧 Multiple GPU Support - NVIDIA CUDA 12.1, NVIDIA CUDA 12.8 (Blackwell), AMD ROCm, Apple MPS
🔄 Easy Updates - Simple --upgrade and --reinstall commands
📦 Portable Mode (Windows) - Self-contained, movable installation — copy to USB, share as zip, run anywhere without Python
🎯 Skip Menu Options - Direct installation with --cpu, --nvidia, --nvidia-cu128, --rocm, --portable flags
Performance: Optimized for speed and efficient VRAM usage on GPU.
Web Interface: Modern, responsive UI for plain text input, parameter adjustment, preset loading, reference/predefined audio management, and audio playback.
Model Loading: Uses ChatterboxTTS.from_pretrained() for robust model loading from Hugging Face Hub, utilizing the standard HF cache.
Dependency Management: Clear requirements.txt.
Utilities: Comprehensive utils.py for audio processing, text handling, and file management.

✅ Features Summary

Core Chatterbox Capabilities (via Resemble AI Chatterbox):
🗣️ High-quality single-speaker voice synthesis from plain text.
🎤 Perform voice cloning using reference audio prompts.
🎯 Complete model family: Original Chatterbox (English, emotion control), Chatterbox Multilingual (23 languages), and Chatterbox‑Turbo (fastest, paralinguistic tags).
🔄 Hot-swappable engines: Switch between all three models instantly via dropdown—no restarts needed.
Enhanced Server & API:
⚡ Built with the high-performance FastAPI framework.
⚙️ Custom API Endpoint (/tts) as the primary method for programmatic generation, exposing all key parameters.
📄 Interactive API documentation via Swagger UI (/docs).
🩺 Health check endpoint (/api/ui/initial-data also serves as a comprehensive status check).
Advanced Generation Features:
🔁 Hot-Swappable Engines: Switch between Original Chatterbox, Chatterbox Multilingual, and Chatterbox‑Turbo directly in the Web UI—no restarts required.
🌍 Multilingual Support: 23 languages including Arabic, Chinese, French, German, Japanese, Spanish, and more via Chatterbox Multilingual.
🎭 Paralinguistic Tags (Turbo): Native support for [laugh], [cough], [chuckle] and other expressive tags.
📚 Large Text Handling: Intelligently splits long plain text inputs into chunks based on sentences, generates audio for each, and concatenates the results seamlessly. Configurable via split_text and chunk_size.
📖 Audiobook Creation: Perfect for generating complete audiobooks from full-length texts with consistent voice quality and automatic chapter handling.
🎤 Predefined Voices: Select from curated synthetic voices in the ./voices directory.
✨ Voice Cloning: Simple voice cloning using an uploaded reference audio file.
🌱 Consistent Generation: Use Predefined Voices or Voice Cloning modes, optionally with a fixed integer Seed, for consistent voice output.
🔇 Audio Post-Processing: Optional automatic steps to trim silence, fix internal pauses, and remove long unvoiced segments/artifacts (configurable via config.yaml).
Intuitive Web User Interface:
🖱️ Modern, easy-to-use interface.
🔁 Engine Selector: Hot-swap between Original Chatterbox, Chatterbox Multilingual, and Chatterbox‑Turbo with a simple dropdown—no restarts needed.
💡 Presets: Load example text and settings dynamically from ui/presets.yaml.
🎤 Reference/Predefined Audio Upload: Easily upload .wav/.mp3 files.
🗣️ Voice Mode Selection: Choose between Predefined Voices or Voice Cloning.
🎛️ Parameter Control: Adjust generation settings (Temperature, Exaggeration, CFG Weight, Speed Factor, Seed, etc.) via sliders and inputs.
💾 Configuration Management: View and save server settings (config.yaml) and default generation parameters directly in the UI.
💾 Session Persistence: Remembers your last used settings via config.yaml.
✂️ Chunking Controls: Enable/disable text splitting and adjust approximate chunk size.
⚠️ Warning Modals: Optional warnings for chunking voice consistency and general generation quality.
🌓 Light/Dark Mode: Toggle between themes with preference saved locally.
🔊 Audio Player: Integrated waveform player (WaveSurfer.js) for generated audio with download option.
⏳ Loading Indicator: Shows status during generation.
Flexible & Efficient Model Handling:
☁️ Downloads models automatically from Hugging Face Hub using ChatterboxTTS.from_pretrained().
🔄 Easily specify model repository via config.yaml.
📄 Optional download_model.py script available to pre-download specific model components to a local directory (this is separate from the main HF cache used at runtime).
Performance & Configuration:
💻 GPU Acceleration: Automatically uses NVIDIA CUDA, Apple MPS, or AMD ROCm if available, falls back to CPU.
⚙️ All configuration via config.yaml.
📦 Uses standard Python virtual environments.
📦 Portable Mode (Windows): Self-contained installation that can be copied, moved, or shared — no Python needed on the target machine.
Docker Support:
🐳 Containerized deployment via Docker and Docker Compose.
🔌 NVIDIA GPU acceleration with Container Toolkit integration.
💾 Persistent volumes for models (HF cache), custom voices, outputs, logs, and config.
🚀 One-command setup and deployment (docker compose up -d).

🔩 System Prerequisites

Operating System: Windows 10/11 (64-bit) or Linux (Debian/Ubuntu recommended).
Python: Version 3.10 required (Download). Python 3.10 is the only version with pre-built wheels for all dependencies (torch, torchvision, ONNX, ONNXRuntime). Python 3.11+ is not supported — key dependencies lack pre-built wheels, causing build failures. On Windows, the launcher's Portable Mode automatically uses an embedded Python 3.10 runtime regardless of your system Python version. When using Portable Mode, Python is only needed on the machine where you first set up the application.
Git: For cloning the repository (Download).
Internet: For downloading dependencies and models from Hugging Face Hub.
Disk Space: 10GB+ recommended (for dependencies and model cache).
(Optional but HIGHLY Recommended for Performance):
NVIDIA GPU (CUDA 12.1): CUDA-compatible (Maxwell architecture or newer, RTX 20/30/40 series). Check NVIDIA CUDA GPUs.
NVIDIA GPU (CUDA 12.8): RTX 5090 or other Blackwell-based GPUs, driver version 570+.
NVIDIA Drivers: Latest version for your GPU/OS (Download).
AMD GPU: ROCm-compatible (e.g., RX 6000/7000 series). Check AMD ROCm GPUs.
AMD Drivers: Latest ROCm-compatible drivers for your GPU/OS (Linux only).
Apple Silicon: M1, M2, M3, M4, or newer Apple Silicon chips with macOS 12.3+ for MPS acceleration.
(Linux Only):
libsndfile1: Audio library needed by soundfile. Install via package manager (e.g., sudo apt install libsndfile1).
ffmpeg: For robust audio operations (optional but recommended). Install via package manager (e.g., sudo apt install ffmpeg).

Reinstall with fresh dependencies

python start.py --reinstall

Step 1: Install dependencies with PyTorch 2.9.0+cu128 (includes sm_120 support)

pip install -r requirements-nvidia-cu128.txt

Step 2: Install chatterbox without dependencies (prevents PyTorch downgrade)

pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master


⚠️ **Critical:** The `--no-deps` flag is required to prevent PyTorch from being downgraded to a version that doesn't support Blackwell GPUs.

**After installation, verify that PyTorch supports sm_120:**

bash python -c "import torch; print(f'PyTorch: {torch.version}'); print(f'CUDA: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}'); print(f'Architectures: {torch.cuda.get_arch_list()}')" ```

You should see sm_120 in the architectures list!

NVIDIA's Blackwell GPUs (RTX 5060 Ti, 5070, 5070 Ti, 5080, 5090) use compute capability sm_120. PyTorch 2.9.0 with CUDA 12.8 includes support for this architecture. Earlier versions (including CUDA 12.1) will fail with the error: CUDA error: no kernel image is available for execution on the device.

See README_CUDA128.md for detailed setup instructions and troubleshooting. </details>

---

Step 2: Install remaining dependencies

pip install -r requirements-rocm.txt

Step 3: Install chatterbox without dependencies (prevents ROCm torch overwrite)

pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master


⚠️ **Critical:** The `--no-deps` flag on chatterbox-tts is required to prevent pip from replacing the ROCm PyTorch wheels with CPU-only versions from PyPI. The `start.py` launcher handles this automatically.

**After installation, verify that PyTorch can see your GPU:**

bash python -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'ROCm available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')" ``` If ROCm available: shows True, your setup is correct!

<details> <summary>💡 How This Works</summary>

ROCm installation uses a two-step process: 1. requirements-rocm-init.txt installs PyTorch from the official ROCm 6.1 wheel index (torch==2.5.1+rocm6.1), ensuring you get AMD GPU-accelerated builds. 2. requirements-rocm.txt installs remaining server dependencies without touching PyTorch. 3. Chatterbox is installed with --no-deps to prevent pip's dependency resolver from replacing the ROCm torch with a CPU-only version.

For APU/iGPU users: If you encounter "HIP error: invalid device function", you may need to set HSA_OVERRIDE_GFX_VERSION. See AMD ROCm Support Details below. </details>

---

Install chatterbox-tts without its dependencies to avoid conflicts

pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master

Install core server dependencies

pip install fastapi 'uvicorn[standard]' librosa safetensors soundfile pydub audiotsm praat-parselmouth python-multipart requests aiofiles PyYAML watchdog unidecode inflect tqdm

Install missing chatterbox dependencies

pip install conformer==0.3.2 diffusers==0.29.0 resemble-perth==1.0.1 transformers==4.46.3

Install s3tokenizer without its problematic dependencies

pip install --no-deps s3tokenizer

Then upgrade dependencies using the launcher

🖥️ Installation fixes across all platforms

All platforms: Chatterbox is now installed with --no-deps across all installation paths (CPU, NVIDIA, cu128, ROCm). This eliminates ONNX source build failures, torch version conflicts, and CMake errors that affected many users. Chatterbox's dependencies (conformer, diffusers, transformers, s3tokenizer, etc.) are now listed explicitly in each requirements file with onnx==1.16.0 pinned to guarantee pre-built wheels.
Apple Silicon / MPS: Fixed Turbo model crash ("Cannot convert a MPS Tensor to float64 dtype") by forcing float32 in s3tokenizer and voice_encoder. Fix applied in the chatterbox-v2 fork and also as an automatic post-install patch in start.py for users of other chatterbox versions. Thanks to @jonas3245 (#93).
Docker CPU: New lightweight Dockerfile.cpu based on python:3.10-slim instead of the 4GB+ NVIDIA CUDA base image. docker-compose-cpu.yml now uses this smaller image. Removed deprecated version tags from all docker-compose files.
config.yaml: Default device changed from cuda to auto for correct auto-detection on all hardware (CUDA, MPS, CPU).
Python version: Python 3.10 is required — it is the only version with pre-built wheels for all dependencies (torch, torchvision, ONNX). Python 3.11+ may fail due to missing wheels. The Windows launcher's Portable Mode handles this automatically by using an embedded Python 3.10 runtime.
Blackwell (CUDA 12.8): Fixed requirements-nvidia-cu128.txt to properly install PyTorch 2.9.0 with CUDA 12.8 (sm_120 support) for RTX 5060 Ti, 5070, 5070 Ti, 5080, and 5090 GPUs. The Dockerfile.cu128 now correctly installs chatterbox with --no-deps to prevent PyTorch downgrade.
AMD ROCm: Fixed ROCm installation by switching to PyTorch's official ROCm 6.1 wheel index (torch==2.5.1+rocm6.1), which resolves the previous torch==2.6.0 / torchaudio==2.5.1 version conflict. A new requirements-rocm-init.txt installs the ROCm PyTorch stack before other dependencies. Both Dockerfile.rocm and start.py now use a two-step install to prevent pip from replacing ROCm torch wheels with CPU-only versions.
Thanks to community contributors in issues #20, #23, #44, #58, #64, #79, #89, #92, #93, #98, #105, #107, #109, #113, #114, #121, and #122 for testing and reporting solutions.

💻 Installation and Setup

This project uses specific dependency files to ensure a smooth installation for your hardware. You can choose between the automated launcher (recommended for most users) or manual installation (for advanced users).

1. Clone the Repository

git clone https://github.com/devnen/Chatterbox-TTS-Server.git
cd Chatterbox-TTS-Server

---

Install in portable mode (Windows) - skip prompt

python start.py --portable

📋 Manual Installation

For users who prefer manual control over the installation process.

2. Create a Python Virtual Environment

Using a virtual environment is crucial to avoid conflicts with other projects.

* Windows (PowerShell):

    python -m venv venv
    .\venv\Scripts\activate

* Linux (Bash):

    python3 -m venv venv
    source venv/bin/activate

Your command prompt should now start with (venv).

3. Choose Your Installation Path

Pick one of the following commands based on your hardware. This single command will install all necessary dependencies with compatible versions.

---

Option 1: CPU-Only Installation

This is the most straightforward option and works on any machine without a compatible GPU.

```bash

Option 2: NVIDIA GPU Installation (CUDA 12.1)

For users with NVIDIA GPUs. This provides the best performance for RTX 20/30/40 series.

Prerequisite: Ensure you have the latest NVIDIA drivers installed. Python 3.10 required (3.11+ is not supported — pre-built wheels for torchvision and ONNX are unavailable).

```bash

Build and start with CUDA 12.8 support

docker compose -f docker-compose-cu128.yml up -d

Option 3: AMD GPU Installation (ROCm)

For users with modern, ROCm-compatible AMD GPUs.

Prerequisite: Ensure you have the latest ROCm drivers installed on a Linux system.

```bash

Step 1: Install ROCm PyTorch stack first

pip install -r requirements-rocm-init.txt

Option 4: Apple Silicon (MPS) Installation

For users with Apple Silicon Macs (M1, M2, M3, M4, etc.).

Prerequisite: Ensure you have macOS 12.3 or later for MPS support.

Step 1: Install PyTorch with MPS support first ```bash

Install a compatible version of ONNX and audio codec

pip install onnx==1.16.0 descript-audio-codec


**After installation, verify that PyTorch can see your GPU:**

bash python -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'MPS available: {torch.backends.mps.is_available()}'); print(f'Device will use: {\"mps\" if torch.backends.mps.is_available() else \"cpu\"}')"

If `MPS available:` shows `True`, your setup is correct!

<details>
<summary><strong>💡 Why This Process Is Different</strong></summary>
Apple Silicon requires a specific installation sequence due to dependency conflicts between the pinned PyTorch versions in chatterbox-tts and the latest PyTorch versions that support MPS. By installing PyTorch first with MPS support, then carefully installing dependencies while avoiding version conflicts, we ensure MPS acceleration works properly. The server's automatic device detection will use MPS when configured and available.
</details>

---

Option 5: AMD Strix Halo (Ryzen AI MAX+) Installation

Note: Use this for AMD Strix Halo APUs (Ryzen AI MAX+ 395 / "Ryzen AI Max" with integrated Radeon 8060S, GFX 11.5.0). Standard discrete Radeon GPUs use Option 3 (ROCm).

Prerequisites: - AMD Strix Halo APU on Linux - ROCm 7.2+ stack installed on the host

Using Docker (recommended): ```bash docker compose -f docker-compose-strixhalo.yml up -d

Option 6: AMD RDNA4 (RX 9070 / RX 9070 XT / R9700) Installation

Note: For AMD RDNA4 discrete GPUs (RX 9070, RX 9070 XT, Radeon AI PRO R9700, gfx1201). ROCm 6.1 wheels do not support gfx1201 — ROCm 7.2 is required. Standard discrete Radeon GPUs (RX 6000/7000) use Option 3.

Prerequisites: - AMD RDNA4 GPU (RX 9070 / 9070 XT / R9700) on Linux - ROCm 7.2+ stack installed on the host (see ROCm install guide) - User in video and render groups: sudo usermod -aG video,render $USER

Using Docker (recommended): ```bash docker compose -f docker-compose-rdna4.yml up -d

Method 2: Stash and Pop (Recommended for Manual Installation)

If you installed manually without using the launcher, this is the standard and safest way to update using Git. It automatically handles your local changes (like to config.yaml) without needing to manually copy files.

First, activate your virtual environment:

```bash

🚀 Quick Start with Automated Launcher (Recommended)

The automated launcher handles virtual environment creation, hardware detection, dependency installation, and server startup - all in one step.

Windows

```bash

Quick Start:

Click the badge above to open the notebook in Google Colab
Select GPU runtime: Runtime → Change runtime type → T4 GPU → Save
Run Cell 1: Click the play button to install dependencies (~1-5 minutes)
Run Cell 2: Start the server and access the Web UI via the provided links
Wait for "Server ready! Click below" message: Locate the "localhost:8004" link and click. This starts the Web UI in your browser
Generate speech: Use the web interface to create high-quality TTS audio

🚀 Live Demo - Try It Now! (Google Colab)

Want to test Chatterbox TTS Server immediately without any installation?

Why Try the Demo?

✅ Full Web UI with all controls and features
✅ Voice cloning with uploaded audio files
✅ Predefined voices included
✅ Large text processing with chunking (perfect for audiobooks)
✅ Free GPU acceleration (T4 GPU)
✅ No installation or setup required
✅ Works on any device with a web browser

Force standard virtual environment (skip portable prompt)

python start.py --no-portable


#### Subsequent Runs

Subsequent Runs

After the first installation, simply run the launcher again to start the server:

```bash

Make sure your (venv) is active

pip install --upgrade pip pip install -r requirements.txt pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master ```

<details> <summary>💡 How This Works</summary> The requirements.txt file installs CPU PyTorch and all server dependencies. Chatterbox is installed separately with --no-deps to prevent pip from pulling in conflicting torch versions or triggering ONNX source builds. </details>

---

Make sure your (venv) is active

pip install --upgrade pip pip install -r requirements-nvidia.txt pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master


**After installation, verify that PyTorch can see your GPU:**

bash python -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')" ``` If CUDA available: shows True, your setup is correct!

<details> <summary>💡 How This Works</summary> The requirements-nvidia.txt file installs PyTorch with CUDA 12.1 support plus all server dependencies. Chatterbox is installed separately with --no-deps to prevent pip from downgrading the CUDA torch to a CPU version or triggering ONNX source builds. </details>

---

Option 2b: NVIDIA GPU with CUDA 12.8 (RTX 5090 / Blackwell)

Note: Only use this if you have a Blackwell-based GPU (RTX 5060 Ti, 5070, 5070 Ti, 5080, 5090). For RTX 2000/3000/4000 series, use Option 2 above.

For users with NVIDIA Blackwell architecture GPUs that require CUDA 12.8 and sm_120 support.

Prerequisites: - NVIDIA RTX 5060 Ti, 5070, 5070 Ti, 5080, 5090 or other Blackwell-based GPU - CUDA 12.8+ drivers (driver version 570+)

Using Docker (Recommended for RTX 5090): ```bash

Make sure your (venv) is active

pip install --upgrade pip

Option 2c: NVIDIA GPU with CUDA 13.0 (DGX Spark / sm_121)

Note: Use this for NVIDIA DGX Spark / GB10 hardware (compute capability sm_121). RTX 5090 stays on cu128 (Option 2b); RTX 30/40 stays on cu121 (Option 2).

For users on the very latest NVIDIA stack who need CUDA 13.0 and PyTorch 2.10.

Prerequisites: - NVIDIA DGX Spark / GB10 or other sm_121-capable hardware - CUDA 13.0+ drivers (driver version 580+)

Using Docker (recommended): ```bash docker compose -f docker-compose-cu130.yml up -d

Make sure your (venv) is active

pip install --upgrade pip

Make sure your (venv) is active

pip install --upgrade pip pip install torch torchvision torchaudio


**Step 2: Configure the server to use MPS**
Update your `config.yaml` to use MPS instead of CUDA:

yaml tts_engine: device: mps # Changed from 'cuda' to 'mps'


**Step 3: Install remaining dependencies**

bash

⚙️ Configuration

The server relies exclusively on config.yaml for runtime configuration.

config.yaml: Located in the project root. This file stores all server settings, model paths, generation defaults, and UI state. It is created automatically on the first run (using defaults from config.py) if it doesn't exist. This is the main file to edit for persistent configuration changes.
UI Configuration: The "Server Configuration" and "Generation Parameters" sections in the Web UI allow direct editing and saving of values into config.yaml.

Key Configuration Areas (in config.yaml or UI):

server: host, port, logging settings.
model: repo_id (e.g., "ResembleAI/chatterbox").
tts_engine: device ('auto', 'cuda', 'mps', 'cpu'), predefined_voices_path, reference_audio_path, default_voice_id.
paths: model_cache (for download_model.py), output.
generation_defaults: Default UI values for temperature, exaggeration, cfg_weight, seed, speed_factor, language.
audio_output: format, sample_rate, max_reference_duration_sec.
ui_state: Stores the last used text, voice mode, file selections, etc., for UI persistence.
ui: title, show_language_select, max_predefined_voices_in_dropdown.
debug: save_intermediate_audio.

⭐ Remember: Changes made to server, model, tts_engine, or paths sections in config.yaml (or via the UI's Server Configuration section) require a server restart to take effect. Changes to generation_defaults or ui_state are applied dynamically or on the next page load.

Model selection: set model.repo_id to one of:

chatterbox (or original) — Original 0.5B English model with emotion exaggeration.
chatterbox-turbo (or turbo) — 350M Turbo model with paralinguistic tags ([laugh], [cough], [chuckle]).
chatterbox-multilingual (or multilingual) — 0.5B multilingual model with 23-language support.

All three are hot-swappable from the Web UI engine dropdown without a server restart.

Chatterbox TTS Server: OpenAI-Compatible API with Web UI, Large Text Handling & Built-in Voices

Self-host Resemble AI's Chatterbox open-source TTS family (Original + Multilingual + Turbo) behind an OpenAI‑compatible API and a modern Web UI. The complete lineup includes the original high-quality model, multilingual support for 23 languages, and Chatterbox‑Turbo—a streamlined 350M-parameter model with dramatically improved throughput and native paralinguistic tags like [laugh], [cough], and [chuckle] for more expressive voice agents and narration. Features voice cloning, large text processing via intelligent chunking, audiobook generation, and consistent, reproducible voices using built-in ready-to-use voices and a generation seed feature.

🚀 Try it now! Test the full TTS server with voice cloning and audiobook generation in Google Colab - no installation required! To use it, please run cells 1 through 4 one at a time. After running cell 4, click on the "https://localhost:8004" link that appears in the output, and your web browser will open the UI from the .colab.dev domain. Read the instructions here.

This server is based on the architecture and UI of our Dia-TTS-Server project but uses the distinct chatterbox-tts engine. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with a fallback to CPU. Make sure you also check our Kitten-TTS-Server project.

-blue.svg?style=for-the-badge)

Chatterbox TTS Server Web UI - Dark Mode

Chatterbox TTS Server Web UI - Light Mode

📦 Portable Mode (Windows): This application supports a fully portable installation — the entire folder, including Python and all dependencies, is self-contained. Copy it to a USB drive, share it as a zip, or move it anywhere. Just double-click start.bat — no Python installation needed on the target machine. Learn more →

---

Method 3: Manual Backup (Alternative)

This method involves manually backing up and restoring your configuration file.

First, activate your virtual environment:

```bash

Install with verbose output for troubleshooting

python start.py --reinstall --nvidia --verbose

🇨🇳 中文文档镜像 AI 翻译 2026-05-23

英文原文章节由系统翻译为中文摘要，便于快速理解。完整原文见上方 "📑 README 深度解析"。

📌 简介

本项目基于 Resemble AI 的 Chatterbox TTS 模型，提供了一个强大的 FastAPI 服务器，简化了 Chatterbox 的使用和集成。您可以立即尝试使用 Google Colab 中的实时演示。

⚡ 功能介绍

本项目的 v2.0.0 版本带来了以下新功能：支持 DGX Spark / sm_121，AMD Strix Halo 和实时语音生成。您可以通过以下方式体验这些功能：

📋 环境依赖

本项目要求您的系统使用 Windows 10/11 (64 位) 或 Linux (Debian/Ubuntu)，并且需要 Python 3.10。您可以通过以下命令安装所需的依赖项：

🛠 安装步骤（Docker/pip/源码）

本项目支持多种安装方式，包括 Docker、pip 和源码部署。您可以通过以下命令安装所需的依赖项：

🚀 使用教程

本项目提供了一个自动化的启动器，能够创建虚拟环境、检测硬件、安装依赖项和启动服务器。您可以通过以下命令启动服务器：

⚙️ 配置说明（含 MCP / env）

本项目支持多种配置方式，包括 MCP、环境变量和关键参数。您可以通过以下命令配置服务器：

🔌 API 说明

本项目提供了一个 OpenAI 兼容的 API 和一个现代的 Web UI，支持 Chatterbox 的原版、高质量模型和多语言支持。您可以通过以下命令访问 API：

❓ FAQ 摘要

本项目的 FAQ 页面提供了常见问题的答案和解决方案。您可以通过以下命令安装所需的依赖项并启动服务器：

🎯 aiskill88 AI 点评 A 级 2026-05-23

成熟的TTS服务框架，集成多项高级功能，API设计友好兼容OpenAI标准，社区活跃维护良好，适合生产环境部署。

📚 实用指南（长尾问题）

适合谁

构建多智能体协作系统的 Agent 开发者
做语音类 AI 产品的开发者

最佳实践

生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
容器内无法访问宿主机 localhost — 使用 host.docker.internal
Python 依赖冲突：建议用 venv / uv 隔离环境

部署方案

Docker：Chatterbox-TTS-Server 提供官方镜像，docker compose up 一键启动
CLI：直接 npm install -g / pip install，命令行调用
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

⚡ 核心功能

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

👥 适合谁

构建多智能体协作系统的 Agent 开发者
做语音类 AI 产品的开发者

⭐ 最佳实践

生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

⚠️ 常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
容器内无法访问宿主机 localhost — 使用 host.docker.internal
Python 依赖冲突：建议用 venv / uv 隔离环境

👥 适合人群

AI 技术爱好者研究人员和学生开发者和工程师技术创业者

🎯 使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

⚖️ 优点与不足

✅ 优点

+MIT 协议，可免费商用
+完全开源免费，无授权费用
+本地部署，数据完全自主可控
+开发者社区支持，遇问题可查可问

⚠️ 不足

−安装和初始配置可能需要一定技术基础
−功能完整性通常不如成熟商业产品
−技术支持主要依赖开源社区，响应速度不稳定

⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台，本页面信息基于公开数据整理，不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后，再部署至生产环境，并做好必要的安全评估。

📄 License 说明

🔗 相关工具推荐

llama-cpp AI技能包

高效的大语言模型C/C++推理框架，支持在本地CPU/GPU上运行量化LLM模型，具有内存占用小、推理速度快的特点。适合

Hugging Face开源的深度学习框架，提供预训练语言模型、视觉模型和多模态模型。集成BERT、GPT、Llama等

📰 相关 AI 新闻

🍿 AI 圈相关吃瓜

AutoGPT 自主完成了任务：把我的文件夹全部重命名了

AI 圈观察

Agent 帮我订了3次机票，全部是同一天的

AI 圈观察

GPT 说"这可能不准确"，然后接着说了5分钟不准确的内容

🗺️ 相关解决方案

ai-workflow-templates

🧩 你可能还需要

基于当前 Skill 的能力图谱，自动补全的工具组合

技能寻求者

MCP · Agent · 工作流

natively-cluely-ai-assistant — Claude Skill 中文使用文档

免费开源的AI面试助手，实时转录，隐蔽模式，局部RAG，BYOK。无订阅，防止数据泄露。

❓ 常见问题 FAQ

是否支持本地运行和离线使用？−

是的，完全自托管，无需依赖云服务，支持本地离线运行。

能否进行声音克隆？+

安装这个工具需要什么基础？+

安装过程中遇到依赖冲突怎么办？+

工具安装成功但运行报错，该怎么处理？+

这个工具是否有数据隐私风险？+

工具更新后会影响已有的配置和数据吗？+

💡 AI Skill Hub 点评

经综合评估，Chatterbox-TTS-Server — AI 语音合成工具中文文档在AI工具赛道中表现稳健，质量优秀。如果你已有明确的使用需求，可以直接上手体验；如果还在评估阶段，建议对比同类工具后再做决策。

📚 深入学习 Chatterbox-TTS-Server — AI 语音合成工具中文文档

查看分步骤安装教程和完整使用指南，快速上手这款工具

⚙️ 安装教程 📚 使用教程

🌐 原始信息

原始名称	`Chatterbox-TTS-Server`
原始描述	Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Topics	`文本转语音API服务开源项目本地部署CUDA加速`
GitHub	https://github.com/devnen/Chatterbox-TTS-Server
License	MIT
语言	Python

🔗 原始来源

🐙 GitHub 仓库 https://github.com/devnen/Chatterbox-TTS-Server 🌐 官方网站 https://colab.research.google.com/github/devnen/Chatterbox-TTS-Server/blob/main/Chatterbox_TTS_Colab_Demo.ipynb

收录时间：2026-05-22 · 更新时间：2026-05-30 · License：MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。

📺 订阅 AI Skill Hub Daily Telegram 频道

每天 8 条精选 AI Skill、MCP、Agent 与自动化工具推送

加入频道 →

Chatterbox-TTS-Server — AI 语音合成工具中文文档

📚 深度解析

📋 工具概览

📖 中文文档

🗣️ Overview: Enhanced Chatterbox TTS Generation

🆕 What's New

🚀 v2.0.0 highlights (new)

✨ Key Features of This Server

✅ Features Summary

🔩 System Prerequisites

Reinstall with fresh dependencies

Step 1: Install dependencies with PyTorch 2.9.0+cu128 (includes sm_120 support)

Step 2: Install chatterbox without dependencies (prevents PyTorch downgrade)

Step 2: Install remaining dependencies

Step 3: Install chatterbox without dependencies (prevents ROCm torch overwrite)

Install chatterbox-tts without its dependencies to avoid conflicts

Install core server dependencies

Install missing chatterbox dependencies

Install s3tokenizer without its problematic dependencies

Then upgrade dependencies using the launcher

🖥️ Installation fixes across all platforms

💻 Installation and Setup

Skip menu and install NVIDIA CUDA 12.1 directly

Install in portable mode (Windows) - skip prompt

📋 Manual Installation

**Option 1: CPU-Only Installation**

**Option 2: NVIDIA GPU Installation (CUDA 12.1)**

Build and start with CUDA 12.8 support

**Option 3: AMD GPU Installation (ROCm)**

Step 1: Install ROCm PyTorch stack first

**Option 4: Apple Silicon (MPS) Installation**

Install a compatible version of ONNX and audio codec

**Option 5: AMD Strix Halo (Ryzen AI MAX+) Installation**

**Option 6: AMD RDNA4 (RX 9070 / RX 9070 XT / R9700) Installation**

**Method 2: Stash and Pop (Recommended for Manual Installation)**

🚀 Quick Start with Automated Launcher (Recommended)

Windows

Quick Start:

🚀 Live Demo - Try It Now! (Google Colab)

Why Try the Demo?

Force standard virtual environment (skip portable prompt)

Subsequent Runs

Make sure your (venv) is active

Make sure your (venv) is active

**Option 2b: NVIDIA GPU with CUDA 12.8 (RTX 5090 / Blackwell)**

Make sure your (venv) is active

**Option 2c: NVIDIA GPU with CUDA 13.0 (DGX Spark / sm_121)**

Make sure your (venv) is active

Make sure your (venv) is active

⚙️ Configuration

Chatterbox TTS Server: OpenAI-Compatible API with Web UI, Large Text Handling & Built-in Voices

**Method 3: Manual Backup (Alternative)**

Install with verbose output for troubleshooting

⚡ 核心功能

👥 适合人群

🎯 使用场景

⚖️ 优点与不足

🔗 相关工具推荐

❓ 常见问题 FAQ

Option 1: CPU-Only Installation

Option 2: NVIDIA GPU Installation (CUDA 12.1)

Option 3: AMD GPU Installation (ROCm)

Option 4: Apple Silicon (MPS) Installation

Option 5: AMD Strix Halo (Ryzen AI MAX+) Installation

Option 6: AMD RDNA4 (RX 9070 / RX 9070 XT / R9700) Installation

Method 2: Stash and Pop (Recommended for Manual Installation)

Option 2b: NVIDIA GPU with CUDA 12.8 (RTX 5090 / Blackwell)

Option 2c: NVIDIA GPU with CUDA 13.0 (DGX Spark / sm_121)

Method 3: Manual Backup (Alternative)