🛠
AI工具

alexandria-audiobook — AI 语音合成工具中文文档

基于 Python · 开源免费,本地部署,数据完全自主可控
英文名:alexandria-audiobook
⭐ 636 Stars 🍴 68 Forks 💻 Python 📄 MIT 🏷 AI 8.2分
8.2AI 综合评分
aiaudiobookaudiobook-generatoraudiobookshelfchapter-markersdialogue-generationtts
✦ AI Skill Hub 推荐

AI Skill Hub 强烈推荐:alexandria-audiobook — AI 语音合成工具中文文档 是一款优质的AI工具。AI 综合评分 8.2 分,在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案,这是一个值得深入了解的选择。

📚 深度解析
alexandria-audiobook — AI 语音合成工具中文文档 是一款基于 Python 的开源工具,在 GitHub 上收获 1k+ Star,是ai、audiobook、audiobook-generator、audiobookshelf领域中的优质开源项目。开源工具的最大优势在于代码完全透明,你可以审计每一行代码的安全性,也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS?**
对于个人开发者和有隐私需求的用户,本地部署的开源工具意味着数据不离本机,不受第三方服务商的数据政策约束。同时,开源工具通常没有使用次数限制和月度费用,一次安装即可长期使用,对于高频使用场景的总拥有成本(TCO)远低于订阅制商业工具。

**安装与环境准备**
alexandria-audiobook — AI 语音合成工具中文文档 依赖 Python 运行环境。建议通过 pyenv(Python)或 nvm(Node.js)管理 Python 版本,避免全局环境污染。对于新手用户,推荐先创建虚拟环境(python -m venv venv && source venv/bin/activate),再安装依赖,这样即使出现问题也可以随时删除虚拟环境重新开始,不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues(已关闭的问题),大多数常见问题都已有解答。遇到 Bug 时,提供 pip list 的输出、完整错误堆栈和最小可复现示例,能显著提高开发者响应速度。AI Skill Hub 将持续追踪 alexandria-audiobook — AI 语音合成工具中文文档 的版本更新,及时通知重要功能变化。
📋 工具概览

alexandria-audiobook — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、audiobook、audiobook-generator 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。

GitHub Stars
⭐ 636
开发语言
Python
支持平台
Windows / macOS / Linux
维护状态
正常维护,社区驱动
开源协议
MIT
AI 综合评分
8.2 分
工具类型
AI工具
Forks
68
📖 中文文档
以下内容由 AI Skill Hub 根据项目信息自动整理,如需查看完整原始文档请访问底部「原始来源」。

alexandria-audiobook — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、audiobook、audiobook-generator 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。

📌 核心特色
  • 开源免费,支持本地部署,数据完全自主可控
  • 活跃的 GitHub 开源社区,持续迭代更新
  • 提供详细文档和使用示例,新手友好
  • 支持自定义配置,灵活适配不同使用环境
  • 可作为基础组件集成进现有技术栈或进行二次开发
🎯 主要使用场景
  • 本地部署运行,保护数据隐私,满足合规要求
  • 自定义集成到现有系统,扩展技术栈能力
  • 作为开源基础组件进行商业化二次开发
以下安装命令基于项目开发语言和类型自动生成,实际以官方 README 为准。
安装命令
# 方式一:pip 安装(推荐)
pip install alexandria-audiobook

# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install alexandria-audiobook

# 方式三:从源码安装(获取最新功能)
git clone https://github.com/Finrandojin/alexandria-audiobook
cd alexandria-audiobook
pip install -e .

# 验证安装
python -c "import alexandria_audiobook; print('安装成功')"
📋 安装步骤说明
  1. 访问 GitHub 仓库页面
  2. 按照 README 文档完成依赖安装
  3. 根据系统环境完成初始化配置
  4. 参考官方示例或文档开始使用
  5. 遇到问题可在 GitHub Issues 中查找解答
以下用法示例由 AI Skill Hub 整理,涵盖最常见的使用场景。
常用命令 / 代码示例
# 命令行使用
alexandria-audiobook --help

# 基本用法
alexandria-audiobook input_file -o output_file

# Python 代码中调用
import alexandria_audiobook

# 示例
result = alexandria_audiobook.process("input")
print(result)
以下配置示例基于典型使用场景生成,具体参数请参照官方文档调整。
配置示例
# alexandria-audiobook 配置文件示例(config.yml)
app:
  name: "alexandria-audiobook"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
alexandria-audiobook --config config.yml

# 或通过环境变量配置
export ALEXANDRIA_AUDIOBOOK_API_KEY="your-key"
export ALEXANDRIA_AUDIOBOOK_OUTPUT_DIR="./output"
📑 README 深度解析 真实文档 完整度 95/100 查看 GitHub 原文 →
以下内容由系统直接从 GitHub README 解析整理,保留代码块、表格与列表结构。

简介

<img width="475" height="467" alt="Alexandria Logo" src="https://github.com/user-attachments/assets/fa2c36d3-a5f3-49ab-9dfe-30933359dfbd" />

Preview a voice from text description

curl -X POST http://127.0.0.1:4200/api/voice_design/preview \ -H "Content-Type: application/json" \ -d '{"description": "A warm, deep male voice", "text": "Hello world."}'

Generate a dataset from Voice Designer description

curl -X POST http://127.0.0.1:4200/api/lora/generate_dataset \ -H "Content-Type: application/json" \ -d '{"name": "warm_voice", "description": "A warm male voice", "texts": ["Hello.", "Goodbye."]}'

Update project metadata (description and global seed)

curl -X POST http://127.0.0.1:4200/api/dataset_builder/update_meta \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male narrator", "global_seed": "42"}'

Features

Requirements

  • Pinokio
  • LLM server (one of the following):
  • LM Studio (local) - recommended: Qwen3 or similar
  • Ollama (local)
  • OpenAI API (cloud)
  • Any OpenAI-compatible API
  • GPU: 8 GB VRAM minimum, 16 GB+ recommended — see compatibility table below
  • Each TTS model uses ~3.4 GB; remaining VRAM determines batch size
  • CPU mode available on all platforms but significantly slower
  • RAM: 16 GB recommended (8 GB minimum)
  • Disk: ~20 GB (8 GB venv/PyTorch, ~7 GB for model weights, working space for audio)

Installation

Option B: Google Colab (No Install Required)

No GPU or wrong OS? Run Alexandria on a free T4 GPU in your browser:

Open In Colab

Requires a free ngrok account for the web UI tunnel. See the notebook for full instructions.

Option C: Docker (NVIDIA GPU)

For integration into automated pipelines or server deployments:

git clone https://github.com/Finrandojin/alexandria-audiobook.git
cd alexandria-audiobook
docker compose up --build

Requires Docker with the NVIDIA Container Toolkit. The web UI is available at http://localhost:4200. TTS models download on first use and are cached in a Docker volume. User data (uploads, voice configs, trained LoRA adapters, audio output) persists via bind mounts to the project directory.

Setup Tab

Configure connections to your LLM and TTS engine.

TTS Settings: - Mode - local (built-in engine) or external (connect to Gradio server) - Device - auto (recommended), cuda, cpu, or mps - Language - TTS synthesis language: English (default), Chinese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, or Auto (let the model detect) - Parallel Workers - Batch size for fast batch rendering (higher = more VRAM usage) - Batch Seed - Fixed seed for reproducible batch output (leave empty for random) - Compile Codec - Enable torch.compile for 3-4x faster batch decoding (adds ~30-60s warmup on first generation) - Sub-batching - Split batches by text length to reduce wasted GPU compute on padding (enabled by default) - Min Sub-batch Size - Minimum chunks per sub-batch before allowing a split (default: 4) - Length Ratio - Maximum longest/shortest text length ratio before forcing a sub-batch split (default: 5) - Speaker Change Pause - Silence in milliseconds between different speakers during merge (default: 500) - Same Speaker Pause - Silence in milliseconds when the same speaker continues during merge (default: 250)

Prompt Settings (Advanced): - Generation Settings - Chunk size and max tokens for LLM responses - LLM Sampling Parameters - Temperature, Top P, Top K, Min P, and Presence Penalty - Banned Tokens - Comma-separated list of tokens to ban from LLM output (useful for disabling thinking mode on models like GLM4, DeepSeek-R1, etc.) - Prompt Customization - System and user prompts used for script generation. Defaults are loaded from default_prompts.txt and can be customized per-session in the UI. Click "Reset to Defaults" to reload the file-based defaults (picks up edits without restarting the app)

Dataset Builder Tab

Build LoRA training datasets interactively, one sample at a time.

  • Create a project with a voice description and optional global seed
  • Define samples — Set text and emotion/style per row
  • Preview audio — Generate and listen to individual samples or batch-generate all at once
  • Cancel batch — Stop a running batch generation without losing completed samples
  • Save as dataset — Export the project as a training-ready dataset that appears in the Training tab
  • Designed voices and Voice Designer descriptions drive the audio generation via Qwen3-TTS VoiceDesign model

Dataset Builder

```bash

List all dataset builder projects

curl http://127.0.0.1:4200/api/dataset_builder/list

Example: [sample.mp3](https://github.com/user-attachments/files/25276110/sample.mp3)

Quick Start

The interface is split into a 5-step core pipeline (green tabs, numbered) and advanced tools (blue tabs, unnumbered). You only need the core pipeline to produce an audiobook.

Screenshots

<img src="https://github.com/user-attachments/assets/874b5e30-56d2-4292-b754-4408fc53f5d6" width="30%"></img> <img src="https://github.com/user-attachments/assets/488cde02-6b93-47fa-874b-97a618ae482c" width="30%"></img> <img src="https://github.com/user-attachments/assets/4c0805a6-bb9d-42c1-a9ff-79bb29d0613c" width="30%"></img> <img src="https://github.com/user-attachments/assets/8e58a5bf-ed8f-4864-8545-1e3d9681b0cf" width="30%"></img> <img src="https://github.com/user-attachments/assets/531830da-8668-4189-a0dc-020e6661bfb6" width="30%"></img>

Update sample rows

curl -X POST http://127.0.0.1:4200/api/dataset_builder/update_rows \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "rows": [{"text": "Hello world.", "emotion": "cheerful"}]}'

Generate a single sample preview

curl -X POST http://127.0.0.1:4200/api/dataset_builder/generate_sample \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male voice", "sample_index": 0, "samples": [{"text": "Hello.", "emotion": "cheerful"}]}'

Batch generate all samples

curl -X POST http://127.0.0.1:4200/api/dataset_builder/generate_batch \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male voice", "samples": [{"text": "Hello.", "emotion": "cheerful"}]}'

Export Options

  • Combined Audiobook - Single MP3 with all voices and natural pauses
  • Individual Voicelines - Separate MP3 per line for DAW editing (Audacity, etc.)
  • Audacity Export - One-click zip with per-speaker WAV tracks, LOF project file, and labels for automatic multi-track import into Audacity
  • M4B Audiobook - Chaptered M4B (AAC) with per-chunk or auto-detected chapter markers for audiobook players (Audiobookshelf, Apple Books, VLC, etc.)

Advanced Tools (Optional)

These tabs are for power users who want more control over voice creation:

  • Designer — Create new voices from text descriptions (e.g., "A warm elderly woman with a gentle raspy voice"). Save them to use as clone references in the Voices tab
  • Dataset — Build LoRA training datasets interactively, one sample at a time with audio preview
  • Training — Train LoRA adapters on voice datasets to create persistent voice identities that follow instruct directions

Configuration

```bash

Get current config (empty prompts fall through to file defaults)

curl http://127.0.0.1:4200/api/config

Save config

curl -X POST http://127.0.0.1:4200/api/config \ -H "Content-Type: application/json" \ -d '{ "llm": {"base_url": "...", "api_key": "...", "model_name": "..."}, "tts": { "mode": "local", "device": "auto", "language": "English", "parallel_workers": 25, "batch_seed": 12345, "compile_codec": true, "sub_batch_enabled": true, "sub_batch_min_size": 4, "sub_batch_ratio": 5, "pause_between_speakers_ms": 500, "pause_same_speaker_ms": 250 } }' ```

Get voices and config

curl http://127.0.0.1:4200/api/voices

Save voice config

curl -X POST http://127.0.0.1:4200/api/save_voice_config \ -H "Content-Type: application/json" \ -d '{"NARRATOR": {"type": "custom", "voice": "Ryan", "character_style": "calm"}}' ```

Configure voices

voice_config = { "NARRATOR": {"type": "custom", "voice": "Ryan", "character_style": "calm narrator"}, "HERO": {"type": "custom", "voice": "Aiden", "character_style": "brave, determined"} } requests.post(f"{BASE}/api/save_voice_config", json=voice_config)

Web Interface

API Reference

Alexandria exposes a REST API for programmatic access:

... poll /api/status/audacity_export until not running ...

with open("audacity_export.zip", "wb") as f: f.write(requests.get(f"{BASE}/api/export_audacity").content) ```

AI-Powered Pipeline

  • Local & Cloud LLM Support - Use any OpenAI-compatible API (LM Studio, Ollama, OpenAI, etc.)
  • Automatic Script Annotation - LLM parses text into JSON with speakers, dialogue, and TTS instruct directions
  • LLM Script Review - Optional second LLM pass that fixes common annotation errors: strips attribution tags from dialogue, splits misattributed narration/dialogue, merges over-split narrator entries, and validates instruct fields
  • Smart Chunking - Groups consecutive lines by speaker (up to 500 chars) for natural flow
  • Context Preservation - Passes character roster and last 3 script entries between chunks for name and style continuity

Core Pipeline

Step 1 — Setup Configure your LLM connection and TTS engine. At minimum you need: - LLM Base URL: http://localhost:1234/v1 (LM Studio) or http://localhost:11434/v1 (Ollama) - LLM API Key: Your API key (use local for local servers) - LLM Model Name: The model to use (e.g., qwen2.5-14b) - TTS Mode: local (built-in, recommended) — loads models directly, no external server needed - Click Save Configuration when done

Step 2 — Script - Select your book file (.txt, .md, or .epub) using the file picker — it uploads automatically - Click Generate Annotated Script — this sends the book to your LLM to split it into annotated chunks with speaker labels and voice directions - (Optional) Click Review Script if the generated script has issues — this runs a second LLM pass to fix speaker misattributions or formatting problems - You can save the script for later use with the Save feature below

Step 3 — Voices Each character detected in the script gets a voice card. For each speaker: - Choose a voice type: Custom Voice (easiest), Clone Voice, LoRA Voice, or Voice Design - For Custom Voice, pick from 9 presets (Ryan, Serena, Aiden, etc.) and optionally set a character style (e.g., "Heavy Scottish accent") - Changes save automatically — see Voice Types for guidance on each type

Step 4 — Editor - Click Render Pending to generate audio for all chunks in batch - Listen to individual chunks or click Play Sequence to preview in order - Edit any chunk's text, speaker, or instruct inline and regenerate it individually - When satisfied, click Merge All to combine everything into the final audiobook

Step 5 — Result - Listen to the finished audiobook in the browser - Download as MP3, or click Export to Audacity for per-speaker WAV tracks

Python Integration

```python import requests

BASE = "http://127.0.0.1:4200"

JavaScript Integration

const BASE = "http://127.0.0.1:4200";

// Upload file
const formData = new FormData();
formData.append("file", fileInput.files[0]);
await fetch(`${BASE}/api/upload`, { method: "POST", body: formData });

// Generate script
await fetch(`${BASE}/api/generate_script`, { method: "POST" });

// Poll for completion
async function waitForTask(taskName) {
  while (true) {
    const res = await fetch(`${BASE}/api/status/${taskName}`);
    const data = await res.json();
    if (data.status === "completed" || data.status === "error") return data;
    await new Promise(r => setTimeout(r, 2000));
  }
}
await waitForTask("script_generation");

// Configure and generate
await fetch(`${BASE}/api/save_voice_config`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    NARRATOR: { type: "custom", voice: "Ryan", character_style: "calm" }
  })
});

// Fast batch render all chunks
const chunks = await (await fetch(`${BASE}/api/chunks`)).json();
const indices = chunks.map(c => c.id);
await fetch(`${BASE}/api/generate_batch_fast`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ indices })
});
// ... poll until all chunks done ...

// Merge into final audiobook
await fetch(`${BASE}/api/merge`, { method: "POST" });

// Export to Audacity
await fetch(`${BASE}/api/export_audacity`, { method: "POST" });
// ... poll /api/status/audacity_export until not running ...
// Download zip from GET /api/export_audacity

Troubleshooting

📚 实用指南(长尾问题)
适合谁
  • 做语音类 AI 产品的开发者
最佳实践
  • 生产部署优先使用 Docker Compose 隔离依赖,并挂载 volume 持久化数据
  • 本地部署优先选 GGUF 量化模型,节省显存并保持响应速度
常见错误
  • API key 直接提交到 git 仓库(请用 .env 并加入 .gitignore)
  • 容器内无法访问宿主机 localhost — 使用 host.docker.internal
  • 显存不足直接 OOM — 优先降低 context 或换更小的量化模型
  • Python 依赖冲突:建议用 venv / uv 隔离环境
部署方案
  • Docker:alexandria-audiobook 提供官方镜像,docker compose up 一键启动
  • CLI:直接 npm install -g / pip install,命令行调用
  • 本地部署:CPU 8GB 起,GPU 推荐 16GB+ 显存
  • 云端托管:可放在 Vercel / Railway / Fly.io 等 PaaS 平台
相关搜索
alexandria-audiobook 中文教程alexandria-audiobook 安装报错怎么办alexandria-audiobook Docker 部署alexandria-audiobook 与同类工具对比alexandria-audiobook 最佳实践alexandria-audiobook 适合谁用
⚡ 核心功能
👥 适合人群
AI 技术爱好者研究人员和学生开发者和工程师技术创业者
🎯 使用场景
  • 本地部署运行,保护数据隐私,满足合规要求
  • 自定义集成到现有系统,扩展技术栈能力
  • 作为开源基础组件进行商业化二次开发
⚖️ 优点与不足
✅ 优点
  • +MIT 协议,可免费商用
  • +完全开源免费,无授权费用
  • +本地部署,数据完全自主可控
  • +开发者社区支持,遇问题可查可问
⚠️ 不足
  • 安装和初始配置可能需要一定技术基础
  • 功能完整性通常不如成熟商业产品
  • 技术支持主要依赖开源社区,响应速度不稳定
⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。

📄 License 说明

✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。

🔗 相关工具推荐
❓ 常见问题 FAQ
alexandria-audiobook 是一款Python开发的AI辅助工具。AI-powered multi-voice audiobook generator — LLM script annotation, voice cloning, voice design, LoRA training, per-line style control, and export to MP3, chaptered M4B, or Audacity multi-track. Built on Qwen3-TTS.
💡 AI Skill Hub 点评

总体来看,alexandria-audiobook — AI 语音合成工具中文文档 是一款质量优秀的AI工具,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。

📚 深入学习 alexandria-audiobook — AI 语音合成工具中文文档
查看分步骤安装教程和完整使用指南,快速上手这款工具
🌐 原始信息
原始名称 alexandria-audiobook
原始描述 AI-powered multi-voice audiobook generator — LLM script annotation, voice cloning, voice design, LoRA training, per-line style control, and export to MP3, chaptered M4B, or Audacity multi-track. Built on Qwen3-TTS.
Topics aiaudiobookaudiobook-generatoraudiobookshelfchapter-markersdialogue-generationtts
GitHub https://github.com/Finrandojin/alexandria-audiobook
License MIT
语言 Python
🔗 原始来源
🐙 GitHub 仓库  https://github.com/Finrandojin/alexandria-audiobook

收录时间:2026-05-22 · 更新时间:2026-05-22 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。