AI Skill Hub 强烈推荐:alexandria-audiobook — AI 语音合成工具中文文档 是一款优质的AI工具。AI 综合评分 8.2 分,在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案,这是一个值得深入了解的选择。
alexandria-audiobook — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、audiobook、audiobook-generator 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
alexandria-audiobook — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、audiobook、audiobook-generator 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install alexandria-audiobook
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install alexandria-audiobook
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/Finrandojin/alexandria-audiobook
cd alexandria-audiobook
pip install -e .
# 验证安装
python -c "import alexandria_audiobook; print('安装成功')"
# 命令行使用
alexandria-audiobook --help
# 基本用法
alexandria-audiobook input_file -o output_file
# Python 代码中调用
import alexandria_audiobook
# 示例
result = alexandria_audiobook.process("input")
print(result)
# alexandria-audiobook 配置文件示例(config.yml) app: name: "alexandria-audiobook" debug: false log_level: "INFO" # 运行时指定配置文件 alexandria-audiobook --config config.yml # 或通过环境变量配置 export ALEXANDRIA_AUDIOBOOK_API_KEY="your-key" export ALEXANDRIA_AUDIOBOOK_OUTPUT_DIR="./output"
<img width="475" height="467" alt="Alexandria Logo" src="https://github.com/user-attachments/assets/fa2c36d3-a5f3-49ab-9dfe-30933359dfbd" />
curl -X POST http://127.0.0.1:4200/api/voice_design/preview \ -H "Content-Type: application/json" \ -d '{"description": "A warm, deep male voice", "text": "Hello world."}'
curl -X POST http://127.0.0.1:4200/api/lora/generate_dataset \ -H "Content-Type: application/json" \ -d '{"name": "warm_voice", "description": "A warm male voice", "texts": ["Hello.", "Goodbye."]}'
curl -X POST http://127.0.0.1:4200/api/dataset_builder/update_meta \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male narrator", "global_seed": "42"}'
No GPU or wrong OS? Run Alexandria on a free T4 GPU in your browser:
Requires a free ngrok account for the web UI tunnel. See the notebook for full instructions.
For integration into automated pipelines or server deployments:
git clone https://github.com/Finrandojin/alexandria-audiobook.git
cd alexandria-audiobook
docker compose up --build
Requires Docker with the NVIDIA Container Toolkit. The web UI is available at http://localhost:4200. TTS models download on first use and are cached in a Docker volume. User data (uploads, voice configs, trained LoRA adapters, audio output) persists via bind mounts to the project directory.
Configure connections to your LLM and TTS engine.
TTS Settings: - Mode - local (built-in engine) or external (connect to Gradio server) - Device - auto (recommended), cuda, cpu, or mps - Language - TTS synthesis language: English (default), Chinese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, or Auto (let the model detect) - Parallel Workers - Batch size for fast batch rendering (higher = more VRAM usage) - Batch Seed - Fixed seed for reproducible batch output (leave empty for random) - Compile Codec - Enable torch.compile for 3-4x faster batch decoding (adds ~30-60s warmup on first generation) - Sub-batching - Split batches by text length to reduce wasted GPU compute on padding (enabled by default) - Min Sub-batch Size - Minimum chunks per sub-batch before allowing a split (default: 4) - Length Ratio - Maximum longest/shortest text length ratio before forcing a sub-batch split (default: 5) - Speaker Change Pause - Silence in milliseconds between different speakers during merge (default: 500) - Same Speaker Pause - Silence in milliseconds when the same speaker continues during merge (default: 250)
Prompt Settings (Advanced): - Generation Settings - Chunk size and max tokens for LLM responses - LLM Sampling Parameters - Temperature, Top P, Top K, Min P, and Presence Penalty - Banned Tokens - Comma-separated list of tokens to ban from LLM output (useful for disabling thinking mode on models like GLM4, DeepSeek-R1, etc.) - Prompt Customization - System and user prompts used for script generation. Defaults are loaded from default_prompts.txt and can be customized per-session in the UI. Click "Reset to Defaults" to reload the file-based defaults (picks up edits without restarting the app)
Build LoRA training datasets interactively, one sample at a time.
```bash
curl http://127.0.0.1:4200/api/dataset_builder/list
The interface is split into a 5-step core pipeline (green tabs, numbered) and advanced tools (blue tabs, unnumbered). You only need the core pipeline to produce an audiobook.
<img src="https://github.com/user-attachments/assets/874b5e30-56d2-4292-b754-4408fc53f5d6" width="30%"></img> <img src="https://github.com/user-attachments/assets/488cde02-6b93-47fa-874b-97a618ae482c" width="30%"></img> <img src="https://github.com/user-attachments/assets/4c0805a6-bb9d-42c1-a9ff-79bb29d0613c" width="30%"></img> <img src="https://github.com/user-attachments/assets/8e58a5bf-ed8f-4864-8545-1e3d9681b0cf" width="30%"></img> <img src="https://github.com/user-attachments/assets/531830da-8668-4189-a0dc-020e6661bfb6" width="30%"></img>
curl -X POST http://127.0.0.1:4200/api/dataset_builder/update_rows \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "rows": [{"text": "Hello world.", "emotion": "cheerful"}]}'
curl -X POST http://127.0.0.1:4200/api/dataset_builder/generate_sample \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male voice", "sample_index": 0, "samples": [{"text": "Hello.", "emotion": "cheerful"}]}'
curl -X POST http://127.0.0.1:4200/api/dataset_builder/generate_batch \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male voice", "samples": [{"text": "Hello.", "emotion": "cheerful"}]}'
1. Install Pinokio if you haven't already 2. Open Alexandria on Pinokio: Install via Pinokio - Or manually: in Pinokio, click Download and paste https://github.com/Finrandojin/alexandria-audiobook 3. Click Install to set up dependencies 4. Click Start to launch the web interface
These tabs are for power users who want more control over voice creation:
| Setting | Recommended | Notes |
|---|---|---|
| TTS Mode | local | Built-in engine, no external server |
| Compile Codec | true | 3-4x faster decoding after one-time warmup |
| Parallel Workers | 20-60 | Higher = more throughput, more VRAM |
| Render Mode | Batch (Fast) | Uses batched TTS calls |
```bash
curl http://127.0.0.1:4200/api/config
curl -X POST http://127.0.0.1:4200/api/config \ -H "Content-Type: application/json" \ -d '{ "llm": {"base_url": "...", "api_key": "...", "model_name": "..."}, "tts": { "mode": "local", "device": "auto", "language": "English", "parallel_workers": 25, "batch_seed": 12345, "compile_codec": true, "sub_batch_enabled": true, "sub_batch_min_size": 4, "sub_batch_ratio": 5, "pause_between_speakers_ms": 500, "pause_same_speaker_ms": 250 } }' ```
curl http://127.0.0.1:4200/api/voices
curl -X POST http://127.0.0.1:4200/api/save_voice_config \ -H "Content-Type: application/json" \ -d '{"NARRATOR": {"type": "custom", "voice": "Ryan", "character_style": "calm"}}' ```
voice_config = { "NARRATOR": {"type": "custom", "voice": "Ryan", "character_style": "calm narrator"}, "HERO": {"type": "custom", "voice": "Aiden", "character_style": "brave, determined"} } requests.post(f"{BASE}/api/save_voice_config", json=voice_config)
Alexandria exposes a REST API for programmatic access:
with open("audacity_export.zip", "wb") as f: f.write(requests.get(f"{BASE}/api/export_audacity").content) ```
Step 1 — Setup Configure your LLM connection and TTS engine. At minimum you need: - LLM Base URL: http://localhost:1234/v1 (LM Studio) or http://localhost:11434/v1 (Ollama) - LLM API Key: Your API key (use local for local servers) - LLM Model Name: The model to use (e.g., qwen2.5-14b) - TTS Mode: local (built-in, recommended) — loads models directly, no external server needed - Click Save Configuration when done
Step 2 — Script - Select your book file (.txt, .md, or .epub) using the file picker — it uploads automatically - Click Generate Annotated Script — this sends the book to your LLM to split it into annotated chunks with speaker labels and voice directions - (Optional) Click Review Script if the generated script has issues — this runs a second LLM pass to fix speaker misattributions or formatting problems - You can save the script for later use with the Save feature below
Step 3 — Voices Each character detected in the script gets a voice card. For each speaker: - Choose a voice type: Custom Voice (easiest), Clone Voice, LoRA Voice, or Voice Design - For Custom Voice, pick from 9 presets (Ryan, Serena, Aiden, etc.) and optionally set a character style (e.g., "Heavy Scottish accent") - Changes save automatically — see Voice Types for guidance on each type
Step 4 — Editor - Click Render Pending to generate audio for all chunks in batch - Listen to individual chunks or click Play Sequence to preview in order - Edit any chunk's text, speaker, or instruct inline and regenerate it individually - When satisfied, click Merge All to combine everything into the final audiobook
Step 5 — Result - Listen to the finished audiobook in the browser - Download as MP3, or click Export to Audacity for per-speaker WAV tracks
```python import requests
BASE = "http://127.0.0.1:4200"
const BASE = "http://127.0.0.1:4200";
// Upload file
const formData = new FormData();
formData.append("file", fileInput.files[0]);
await fetch(`${BASE}/api/upload`, { method: "POST", body: formData });
// Generate script
await fetch(`${BASE}/api/generate_script`, { method: "POST" });
// Poll for completion
async function waitForTask(taskName) {
while (true) {
const res = await fetch(`${BASE}/api/status/${taskName}`);
const data = await res.json();
if (data.status === "completed" || data.status === "error") return data;
await new Promise(r => setTimeout(r, 2000));
}
}
await waitForTask("script_generation");
// Configure and generate
await fetch(`${BASE}/api/save_voice_config`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
NARRATOR: { type: "custom", voice: "Ryan", character_style: "calm" }
})
});
// Fast batch render all chunks
const chunks = await (await fetch(`${BASE}/api/chunks`)).json();
const indices = chunks.map(c => c.id);
await fetch(`${BASE}/api/generate_batch_fast`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ indices })
});
// ... poll until all chunks done ...
// Merge into final audiobook
await fetch(`${BASE}/api/merge`, { method: "POST" });
// Export to Audacity
await fetch(`${BASE}/api/export_audacity`, { method: "POST" });
// ... poll /api/status/audacity_export until not running ...
// Download zip from GET /api/export_audacity
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
总体来看,alexandria-audiobook — AI 语音合成工具中文文档 是一款质量优秀的AI工具,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | alexandria-audiobook |
| 原始描述 | AI-powered multi-voice audiobook generator — LLM script annotation, voice cloning, voice design, LoRA training, per-line style control, and export to MP3, chaptered M4B, or Audacity multi-track. Built on Qwen3-TTS. |
| Topics | aiaudiobookaudiobook-generatoraudiobookshelfchapter-markersdialogue-generationtts |
| GitHub | https://github.com/Finrandojin/alexandria-audiobook |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-22 · 更新时间:2026-05-22 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。