经 AI Skill Hub 精选评估,TTS-WebUI — AI 语音合成工具中文文档 获评「强烈推荐」。已获得 3.1k 颗 GitHub Star,这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.8 分,适合有一定技术背景的用户使用。
TTS-WebUI — AI 语音合成工具中文文档 是一款基于 TypeScript 开发的开源工具,专注于 ace-step、ai、audio-generation 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
TTS-WebUI — AI 语音合成工具中文文档 是一款基于 TypeScript 开发的开源工具,专注于 ace-step、ai、audio-generation 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:npm 全局安装 npm install -g tts-webui # 方式二:npx 直接运行(无需安装) npx tts-webui --help # 方式三:项目依赖安装 npm install tts-webui # 方式四:从源码运行 git clone https://github.com/rsxdalv/TTS-WebUI cd TTS-WebUI npm install npm start
# 命令行使用
tts-webui --help
# 基本用法
tts-webui [options] <input>
# Node.js 代码中使用
const tts_webui = require('tts-webui');
const result = await tts_webui.run(options);
console.log(result);
# tts-webui 配置说明 # 查看配置选项 tts-webui --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export TTS_WEBUI_CONFIG="/path/to/config.yml"
Download Installer || Installation || Docker Setup || Silly Tavern || Extensions || Feedback / Bug reports
</h4>
</div>
https://github.com/rsxdalv/tts-webui/discussions/186#discussioncomment-7291274
The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.
That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.
Known non-permissive dependencies: <div class="rdm-tbl-wrap"><table class="rdm-tbl"><thead><tr><th>Library</th><th>License</th><th>Notes</th></tr></thead><tbody><tr><td>encodec</td><td>CC BY-NC 4.0</td><td>Newer versions are MIT, but need to be installed manually</td></tr><tr><td>diffq</td><td>CC BY-NC 4.0</td><td>Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs</td></tr><tr><td>lameenc</td><td>GPL License</td><td>Future versions will make it LGPL, but need to be installed manually</td></tr><tr><td>unidecode</td><td>GPL License</td><td>Not mission critical, can be replaced with another library, issue: https://github.com/neonbjb/tortoise-tts/issues/494</td></tr></tbody></table></div>
Current base installation size is around 10.7 GB. Each model will require 2-8 GB of space in addition.
Prerequisites: git Python 3.10 or 3.11 (3.12 not supported yet) PyTorch ffmpeg (with vorbis support) (Optional) NodeJS 22.9.0 for React UI SQLite (bundled with Python) for database support
1. Clone the repository:
git clone https://github.com/rsxdalv/tts-webui.git
cd tts-webui
2. Install required packages: pip install -r requirements.txt
3. Run the server:
python server.py --no-react
4. For React UI:
cd react-ui
npm install
npm run build
cd ..
python server.py
For detailed manual installation instructions, please refer to the Manual Installation Guide.
tts-webui can also be ran inside of a Docker container. Using CUDA inside of docker requires NVIDIA Container Toolkit. To get started, pull the image from GitHub Container Registry:
docker pull ghcr.io/rsxdalv/tts-webui:main
Once the image has been pulled it can be started with Docker Compose: The ports are 7770 (env:TTS_PORT) for the Gradio backend and 3000 (env:UI_PORT) for the React front end.
docker compose up -d
The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:
docker logs tts-webui
#### Building the image yourself If you wish to build your own docker container, you can use the included Dockerfile:
docker build -t tts-webui . Please note that the docker-compose needs to be edited to use the image you just built.
</div>
| <video src="https://github.com/user-attachments/assets/16ac948a-fe98-49ad-ad87-19c41fe7e65e" width="300"></video> | <video src="https://github.com/user-attachments/assets/55bde4f7-bbcc-4ecf-8f94-b315b9d22e74" width="300"></video> | <video src="https://github.com/user-attachments/assets/fcee8906-a101-400d-8499-4e72c7603042" width="300"></video> |
|---|
</div>
| .png) | .png) | .png) |
|---|
| .png) | .png) | .png) |
|---|
Using the instructions above, you can install an OpenAI compatible API, and use it with Silly Tavern or other OpenAI compatible clients.
| Text-to-speech | Audio/Music Generation | Audio Conversion/Tools |
|---|---|---|
| [Bark](https://github.com/suno-ai/bark) | [MusicGen](https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md) | [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) |
| [Tortoise](https://github.com/neonbjb/tortoise-tts) | [MAGNeT](https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md) | [Demucs](https://github.com/facebookresearch/demucs) |
| [Maha TTS](https://github.com/dubverse-ai/MahaTTS) | [Stable Audio](https://github.com/Stability-AI/stable-audio-tools) | [Vocos](https://github.com/gemelo-ai/vocos) |
| [MMS](https://github.com/facebookresearch/fairseq/blob/main/examples/mms/README.md) | [Riffusion\*](https://github.com/riffusion/riffusion-hobby) | [Whisper](https://github.com/openai/whisper) |
| [Vall-E X](https://github.com/Plachtaa/VALL-E-X) | [AudioCraft Mac\*](https://github.com/trizko/audiocraft) | [AP BWE](https://github.com/yxlu-0102/AP-BWE) |
| [StyleTTS2](https://github.com/sidharthrajaram/StyleTTS2) | [AudioCraft Plus\*](https://github.com/GrandaddyShmax/audiocraft_plus) | [Resemble Enhance](https://github.com/resemble-ai/resemble-enhance) |
| [SeamlessM4T](https://github.com/facebookresearch/seamless_communication) | [ACE-Step\*](https://github.com/ACE-Step/ACE-Step) | [Audio Separator](https://github.com/nomadkaraoke/python-audio-separator) |
| [XTTSv2\*](https://github.com/coqui-ai/TTS) | [Song Bloom\*](https://github.com/rsxdalv/tts_webui_extension.song_bloom) | [PyRNNoise\*](https://github.com/rsxdalv/tts_webui_extension.pyrnnoise) |
| [MARS5\*](https://github.com/camb-ai/mars5-tts) | [MiMo Audio\*](https://github.com/rsxdalv/tts_webui_extension.mimo_audio) | |
| [F5-TTS\*](https://github.com/SWivid/F5-TTS) | ||
| [Parler TTS\*](https://github.com/huggingface/parler-tts) | ||
| [OpenVoice\*](https://github.com/myshell-ai/OpenVoice) | ||
| [OpenVoice V2\*](https://github.com/myshell-ai/OpenVoice) | ||
| [Kokoro TTS\*](https://github.com/hexgrad/kokoro) | ||
| [DIA\*](https://github.com/nari-labs/dia) | ||
| [CosyVoice\*](https://github.com/FunAudioLLM/CosyVoice) | ||
| [GPT-SoVITS\*](https://github.com/X-T-E-R/GPT-SoVITS-Inference) | ||
| [Piper TTS\*](https://github.com/rhasspy/piper) | ||
| [Kimi Audio 7B Instruct\*](https://github.com/Dao-AILab/Kimi-Audio) | ||
| [Chatterbox\*](https://github.com/rsxdalv/chatterbox) | ||
| [VibeVoice\*](https://github.com/rsxdalv/tts_webui_extension.vibevoice) | ||
| [Kitten TTS\*](https://github.com/rsxdalv/tts_webui_extension.kitten_tts) | ||
| [Index-TTS2\*](https://github.com/rsxdalv/tts_webui_extension.index_tts) | ||
| [VoxCPM\*](https://github.com/rsxdalv/tts_webui_extension.vox_cpm) | ||
| [FireRedTTS2\*](https://github.com/rsxdalv/tts_webui_extension.fireredtts2) | ||
| [MegaTTS3\*](https://github.com/rsxdalv/tts_webui_extension.megatts3) | ||
| [MiniMax Cloud TTS](https://www.minimaxi.com) (built-in) |
\* These models are not installed by default, instead they are available as extensions.
</div>
Extensions are available to install from the webui itself, or using React UI. They can also be installed using the extension manager or the External Extensions Installer (a built-in tool for adding custom extensions from JSON).
Internally, extensions are just python packages that are installed using pip. Multiple extensions can be installed at the same time, but there might be compatibility issues between them. After installing or updating an extension, you need to restart the app to load it.
For a curated list of community-created extensions, visit the TTS WebUI Extension Catalog. You can also find information on publishing your own extensions there.
Updates need to be done manually by using the mini-control panel:

AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:TTS-WebUI — AI 语音合成工具中文文档 的核心功能完整,质量优秀。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | TTS-WebUI |
| 原始描述 | A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark! |
| Topics | ace-stepaiaudio-generationcosyvoicegenerative-aigeneratortts |
| GitHub | https://github.com/rsxdalv/TTS-WebUI |
| License | MIT |
| 语言 | TypeScript |
收录时间:2026-05-22 · 更新时间:2026-05-22 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。