Chatterbox-TTS-Server — AI 语音合成工具中文文档 是 AI Skill Hub 本期精选AI工具之一。已获得 1.2k 颗 GitHub Star,综合评分 8.5 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
Chatterbox-TTS-Server — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、api-server、audio-generation 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
Chatterbox-TTS-Server — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、api-server、audio-generation 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install chatterbox-tts-server
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install chatterbox-tts-server
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/devnen/Chatterbox-TTS-Server
cd Chatterbox-TTS-Server
pip install -e .
# 验证安装
python -c "import chatterbox_tts_server; print('安装成功')"
# 命令行使用
chatterbox-tts-server --help
# 基本用法
chatterbox-tts-server input_file -o output_file
# Python 代码中调用
import chatterbox_tts_server
# 示例
result = chatterbox_tts_server.process("input")
print(result)
# chatterbox-tts-server 配置文件示例(config.yml) app: name: "chatterbox-tts-server" debug: false log_level: "INFO" # 运行时指定配置文件 chatterbox-tts-server --config config.yml # 或通过环境变量配置 export CHATTERBOX_TTS_SERVER_API_KEY="your-key" export CHATTERBOX_TTS_SERVER_OUTPUT_DIR="./output"
The Chatterbox TTS model by Resemble AI provides capabilities for generating high-quality speech. This project builds upon that foundation by providing a robust FastAPI server that makes Chatterbox significantly easier to use and integrate.
🚀 Want to try it instantly? Launch the live demo in Google Colab - no installation needed!
The server expects plain text input for synthesis and we solve the complexity of setting up and running the model by offering:
[laugh], [cough], and [chuckle] for natural non-speech reactions inside the same generated voice.This server is your gateway to leveraging Chatterbox's TTS capabilities seamlessly, with enhanced stability, voice consistency, and large text support for plain text inputs.
v2.0 ships the complete Chatterbox family on every major GPU stack behind one OpenAI-compatible API and Web UI. The headline themes:
docker-compose-cu130.yml (CUDA 13.0, PyTorch 2.10). RTX 30/40/50 keep using cu121 / cu128.docker-compose-strixhalo.yml (ROCm 7.2, HSA_OVERRIDE_GFX_VERSION=11.0.0)./tts endpoint — opt-in stream: true parameter returns a StreamingResponse that flushes WAV bytes per chunk with 20 ms crossfades. Default behavior unchanged.TTS_BF16=on (or =auto) converts T3 to bfloat16 and runs under autocast for ~40% throughput on bf16-capable GPUs. Default off to preserve existing behavior on upgrade.ssl_certfile and ssl_keyfile in config.yaml for direct HTTPS without a reverse proxy./tts and /v1/audio/speech voice file parameters. Traversal attempts return HTTP 400./api/unload (release GPU memory without restart) and /v1/audio/voices (OpenAI-compatible voice listing).SUPPORTED_LANGUAGES exposed by the multilingual engine.See the v2.0.0 release notes for the full list with contributor credits.
🔥 Live Demo Available: * 🚀 One-Click Google Colab Demo: Try the full server with voice cloning and audiobook generation instantly in your browser - no local installation required!
This server application enhances the underlying chatterbox-tts engine with the following:
🚀 Core Functionality:
[laugh], [cough], and [chuckle] directly in your text when using Chatterbox‑Turbo../voices directory..wav or .mp3).seed parameter to UI and API for influencing generation results. Using a fixed integer seed in combination with Predefined Voices or Voice Cloning helps maintain consistency./tts):split_text, chunk_size), generation settings (temperature, exaggeration, CFG weight, seed, speed factor, language), and output format.config.yaml settings (server, model, paths) and save generation defaults.config.yaml for all runtime configuration, managed via config.py (YamlConfigManager). If config.yaml is missing, it's created with default values from config.py.parselmouth is installed) unvoiced segment removal to improve audio quality. These are configurable.config.yaml (ui_state section).🔧 General Enhancements:
start.bat / start.sh) - One-command setup with automatic hardware detection--upgrade and --reinstall commands--cpu, --nvidia, --nvidia-cu128, --rocm, --portable flagsChatterboxTTS.from_pretrained() for robust model loading from Hugging Face Hub, utilizing the standard HF cache.requirements.txt.utils.py for audio processing, text handling, and file management./tts) as the primary method for programmatic generation, exposing all key parameters./docs)./api/ui/initial-data also serves as a comprehensive status check).[laugh], [cough], [chuckle] and other expressive tags.split_text and chunk_size../voices directory.config.yaml).ui/presets.yaml..wav/.mp3 files.config.yaml) and default generation parameters directly in the UI.config.yaml.ChatterboxTTS.from_pretrained().config.yaml.download_model.py script available to pre-download specific model components to a local directory (this is separate from the main HF cache used at runtime).config.yaml.docker compose up -d).libsndfile1: Audio library needed by soundfile. Install via package manager (e.g., sudo apt install libsndfile1).ffmpeg: For robust audio operations (optional but recommended). Install via package manager (e.g., sudo apt install ffmpeg).python start.py --reinstall
pip install -r requirements-nvidia-cu128.txt
pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master
⚠️ **Critical:** The `--no-deps` flag is required to prevent PyTorch from being downgraded to a version that doesn't support Blackwell GPUs.
**After installation, verify that PyTorch supports sm_120:**bash python -c "import torch; print(f'PyTorch: {torch.version}'); print(f'CUDA: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}'); print(f'Architectures: {torch.cuda.get_arch_list()}')" ```
You should see sm_120 in the architectures list!
<details> <summary><strong>💡 Why CUDA 12.8?</strong></summary>
NVIDIA's Blackwell GPUs (RTX 5060 Ti, 5070, 5070 Ti, 5080, 5090) use compute capability sm_120. PyTorch 2.9.0 with CUDA 12.8 includes support for this architecture. Earlier versions (including CUDA 12.1) will fail with the error: CUDA error: no kernel image is available for execution on the device.
See README_CUDA128.md for detailed setup instructions and troubleshooting. </details>
---
pip install -r requirements-rocm.txt
pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master
⚠️ **Critical:** The `--no-deps` flag on chatterbox-tts is required to prevent pip from replacing the ROCm PyTorch wheels with CPU-only versions from PyPI. The `start.py` launcher handles this automatically.
**After installation, verify that PyTorch can see your GPU:**bash python -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'ROCm available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')" ``` If ROCm available: shows True, your setup is correct!
<details> <summary><strong>💡 How This Works</strong></summary>
ROCm installation uses a two-step process: 1. requirements-rocm-init.txt installs PyTorch from the official ROCm 6.1 wheel index (torch==2.5.1+rocm6.1), ensuring you get AMD GPU-accelerated builds. 2. requirements-rocm.txt installs remaining server dependencies without touching PyTorch. 3. Chatterbox is installed with --no-deps to prevent pip's dependency resolver from replacing the ROCm torch with a CPU-only version.
For APU/iGPU users: If you encounter "HIP error: invalid device function", you may need to set HSA_OVERRIDE_GFX_VERSION. See AMD ROCm Support Details below. </details>
---
pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master
pip install fastapi 'uvicorn[standard]' librosa safetensors soundfile pydub audiotsm praat-parselmouth python-multipart requests aiofiles PyYAML watchdog unidecode inflect tqdm
pip install conformer==0.3.2 diffusers==0.29.0 resemble-perth==1.0.1 transformers==4.46.3
pip install --no-deps s3tokenizer
--no-deps across all installation paths (CPU, NVIDIA, cu128, ROCm). This eliminates ONNX source build failures, torch version conflicts, and CMake errors that affected many users. Chatterbox's dependencies (conformer, diffusers, transformers, s3tokenizer, etc.) are now listed explicitly in each requirements file with onnx==1.16.0 pinned to guarantee pre-built wheels.start.py for users of other chatterbox versions. Thanks to @jonas3245 (#93).Dockerfile.cpu based on python:3.10-slim instead of the 4GB+ NVIDIA CUDA base image. docker-compose-cpu.yml now uses this smaller image. Removed deprecated version tags from all docker-compose files.cuda to auto for correct auto-detection on all hardware (CUDA, MPS, CPU).requirements-nvidia-cu128.txt to properly install PyTorch 2.9.0 with CUDA 12.8 (sm_120 support) for RTX 5060 Ti, 5070, 5070 Ti, 5080, and 5090 GPUs. The Dockerfile.cu128 now correctly installs chatterbox with --no-deps to prevent PyTorch downgrade.torch==2.5.1+rocm6.1), which resolves the previous torch==2.6.0 / torchaudio==2.5.1 version conflict. A new requirements-rocm-init.txt installs the ROCm PyTorch stack before other dependencies. Both Dockerfile.rocm and start.py now use a two-step install to prevent pip from replacing ROCm torch wheels with CPU-only versions.This project uses specific dependency files to ensure a smooth installation for your hardware. You can choose between the automated launcher (recommended for most users) or manual installation (for advanced users).
1. Clone the Repository
git clone https://github.com/devnen/Chatterbox-TTS-Server.git
cd Chatterbox-TTS-Server
---
python start.py --portable
For users who prefer manual control over the installation process.
2. Create a Python Virtual Environment
Using a virtual environment is crucial to avoid conflicts with other projects.
* Windows (PowerShell):
python -m venv venv
.\venv\Scripts\activate
* Linux (Bash):
python3 -m venv venv
source venv/bin/activate
Your command prompt should now start with (venv).
3. Choose Your Installation Path
Pick one of the following commands based on your hardware. This single command will install all necessary dependencies with compatible versions.
---
This is the most straightforward option and works on any machine without a compatible GPU.
```bash
For users with NVIDIA GPUs. This provides the best performance for RTX 20/30/40 series.
Prerequisite: Ensure you have the latest NVIDIA drivers installed. Python 3.10 required (3.11+ is not supported — pre-built wheels for torchvision and ONNX are unavailable).
```bash
docker compose -f docker-compose-cu128.yml up -d
For users with modern, ROCm-compatible AMD GPUs.
Prerequisite: Ensure you have the latest ROCm drivers installed on a Linux system.
```bash
pip install -r requirements-rocm-init.txt
For users with Apple Silicon Macs (M1, M2, M3, M4, etc.).
Prerequisite: Ensure you have macOS 12.3 or later for MPS support.
Step 1: Install PyTorch with MPS support first ```bash
pip install onnx==1.16.0 descript-audio-codec
**After installation, verify that PyTorch can see your GPU:**bash python -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'MPS available: {torch.backends.mps.is_available()}'); print(f'Device will use: {\"mps\" if torch.backends.mps.is_available() else \"cpu\"}')" If `MPS available:` shows `True`, your setup is correct!
<details>
<summary><strong>💡 Why This Process Is Different</strong></summary>
Apple Silicon requires a specific installation sequence due to dependency conflicts between the pinned PyTorch versions in chatterbox-tts and the latest PyTorch versions that support MPS. By installing PyTorch first with MPS support, then carefully installing dependencies while avoiding version conflicts, we ensure MPS acceleration works properly. The server's automatic device detection will use MPS when configured and available.
</details>
---
Note: Use this for AMD Strix Halo APUs (Ryzen AI MAX+ 395 / "Ryzen AI Max" with integrated Radeon 8060S, GFX 11.5.0). Standard discrete Radeon GPUs use Option 3 (ROCm).
Prerequisites: - AMD Strix Halo APU on Linux - ROCm 7.2+ stack installed on the host
Using Docker (recommended): ```bash docker compose -f docker-compose-strixhalo.yml up -d
If you installed manually without using the launcher, this is the standard and safest way to update using Git. It automatically handles your local changes (like to config.yaml) without needing to manually copy files.
First, activate your virtual environment:
```bash
The automated launcher handles virtual environment creation, hardware detection, dependency installation, and server startup - all in one step.
```bash
Want to test Chatterbox TTS Server immediately without any installation?
python start.py --no-portable
#### Subsequent Runs
After the first installation, simply run the launcher again to start the server:
```bash
pip install --upgrade pip pip install -r requirements.txt pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master ```
<details> <summary><strong>💡 How This Works</strong></summary> The requirements.txt file installs CPU PyTorch and all server dependencies. Chatterbox is installed separately with --no-deps to prevent pip from pulling in conflicting torch versions or triggering ONNX source builds. </details>
---
pip install --upgrade pip pip install -r requirements-nvidia.txt pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master
**After installation, verify that PyTorch can see your GPU:**bash python -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')" ``` If CUDA available: shows True, your setup is correct!
<details> <summary><strong>💡 How This Works</strong></summary> The requirements-nvidia.txt file installs PyTorch with CUDA 12.1 support plus all server dependencies. Chatterbox is installed separately with --no-deps to prevent pip from downgrading the CUDA torch to a CPU version or triggering ONNX source builds. </details>
---
Note: Only use this if you have a Blackwell-based GPU (RTX 5060 Ti, 5070, 5070 Ti, 5080, 5090). For RTX 2000/3000/4000 series, use Option 2 above.
For users with NVIDIA Blackwell architecture GPUs that require CUDA 12.8 and sm_120 support.
Prerequisites: - NVIDIA RTX 5060 Ti, 5070, 5070 Ti, 5080, 5090 or other Blackwell-based GPU - CUDA 12.8+ drivers (driver version 570+)
Using Docker (Recommended for RTX 5090): ```bash
pip install --upgrade pip
Note: Use this for NVIDIA DGX Spark / GB10 hardware (compute capability sm_121). RTX 5090 stays on cu128 (Option 2b); RTX 30/40 stays on cu121 (Option 2).
For users on the very latest NVIDIA stack who need CUDA 13.0 and PyTorch 2.10.
Prerequisites: - NVIDIA DGX Spark / GB10 or other sm_121-capable hardware - CUDA 13.0+ drivers (driver version 580+)
Using Docker (recommended): ```bash docker compose -f docker-compose-cu130.yml up -d
pip install --upgrade pip
pip install --upgrade pip pip install torch torchvision torchaudio
**Step 2: Configure the server to use MPS**
Update your `config.yaml` to use MPS instead of CUDA:yaml tts_engine: device: mps # Changed from 'cuda' to 'mps'
**Step 3: Install remaining dependencies**bash
The server relies exclusively on config.yaml for runtime configuration.
config.yaml: Located in the project root. This file stores all server settings, model paths, generation defaults, and UI state. It is created automatically on the first run (using defaults from config.py) if it doesn't exist. This is the main file to edit for persistent configuration changes.config.yaml.Key Configuration Areas (in config.yaml or UI):
server: host, port, logging settings.model: repo_id (e.g., "ResembleAI/chatterbox").tts_engine: device ('auto', 'cuda', 'mps', 'cpu'), predefined_voices_path, reference_audio_path, default_voice_id.paths: model_cache (for download_model.py), output.generation_defaults: Default UI values for temperature, exaggeration, cfg_weight, seed, speed_factor, language.audio_output: format, sample_rate, max_reference_duration_sec.ui_state: Stores the last used text, voice mode, file selections, etc., for UI persistence.ui: title, show_language_select, max_predefined_voices_in_dropdown.debug: save_intermediate_audio.⭐ Remember: Changes made to server, model, tts_engine, or paths sections in config.yaml (or via the UI's Server Configuration section) require a server restart to take effect. Changes to generation_defaults or ui_state are applied dynamically or on the next page load.
Model selection: set model.repo_id to one of:
chatterbox (or original) — Original 0.5B English model with emotion exaggeration.chatterbox-turbo (or turbo) — 350M Turbo model with paralinguistic tags ([laugh], [cough], [chuckle]).chatterbox-multilingual (or multilingual) — 0.5B multilingual model with 23-language support.All three are hot-swappable from the Web UI engine dropdown without a server restart.
Self-host Resemble AI's Chatterbox open-source TTS family (Original + Multilingual + Turbo) behind an OpenAI‑compatible API and a modern Web UI. The complete lineup includes the original high-quality model, multilingual support for 23 languages, and Chatterbox‑Turbo—a streamlined 350M-parameter model with dramatically improved throughput and native paralinguistic tags like [laugh], [cough], and [chuckle] for more expressive voice agents and narration. Features voice cloning, large text processing via intelligent chunking, audiobook generation, and consistent, reproducible voices using built-in ready-to-use voices and a generation seed feature.
🚀 Try it now! Test the full TTS server with voice cloning and audiobook generation in Google Colab - no installation required! To use it, please run cells 1 through 4 one at a time. After running cell 4, click on the "https://localhost:8004" link that appears in the output, and your web browser will open the UI from the .colab.dev domain. Read the instructions here.
This server is based on the architecture and UI of our Dia-TTS-Server project but uses the distinct chatterbox-tts engine. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with a fallback to CPU. Make sure you also check our Kitten-TTS-Server project.
-blue.svg?style=for-the-badge)
📦 Portable Mode (Windows): This application supports a fully portable installation — the entire folder, including Python and all dependencies, is self-contained. Copy it to a USB drive, share it as a zip, or move it anywhere. Just double-click start.bat — no Python installation needed on the target machine. Learn more →
---
This method involves manually backing up and restoring your configuration file.
First, activate your virtual environment:
```bash
python start.py --reinstall --nvidia --verbose
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
经综合评估,Chatterbox-TTS-Server — AI 语音合成工具中文文档 在AI工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | Chatterbox-TTS-Server |
| 原始描述 | Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU. |
| Topics | aiapi-serveraudio-generationchatterboxchatterbox-ttscudatts |
| GitHub | https://github.com/devnen/Chatterbox-TTS-Server |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-22 · 更新时间:2026-05-22 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。