📄 工具详情 ⚙️ 安装教程 📚 使用教程

🛠

AI工具

alexandria-audiobook — AI 语音合成工具中文文档

Q: alexandria-audiobook 如何安装和开始使用？

访问 alexandria-audiobook 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

Q: alexandria-audiobook 是否免费？许可证是什么？

alexandria-audiobook 完全免费，采用 MIT 许可证开源发布，任何人都可以免费使用、修改和分发。

Q: alexandria-audiobook 适合哪些用户使用？

alexandria-audiobook 主要面向有一定技术基础的用户，包括开发者、数据分析师、AI 工程师等专业人士。

Q: alexandria-audiobook 的社区活跃度和项目维护状况如何？

alexandria-audiobook 在 GitHub 上已获得 636 个 Star，处于积极发展阶段，社区在持续扩大。

基于 Python · 开源免费，本地部署，数据完全自主可控

英文名：alexandria-audiobook

⭐ 636 Stars 🍴 68 Forks 💻 Python 📄 MIT 🏷 AI 8.2分

8.2AI 综合评分

aiaudiobookaudiobook-generatoraudiobookshelfchapter-markersdialogue-generationtts

✦ AI Skill Hub 推荐

AI Skill Hub 强烈推荐：alexandria-audiobook — AI 语音合成工具中文文档是一款优质的AI工具。AI 综合评分 8.2 分，在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案，这是一个值得深入了解的选择。

📚 深度解析

alexandria-audiobook — AI 语音合成工具中文文档是一款基于 Python 的开源工具，在 GitHub 上收获 1k+ Star，是ai、audiobook、audiobook-generator、audiobookshelf领域中的优质开源项目。开源工具的最大优势在于代码完全透明，你可以审计每一行代码的安全性，也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS？**
对于个人开发者和有隐私需求的用户，本地部署的开源工具意味着数据不离本机，不受第三方服务商的数据政策约束。同时，开源工具通常没有使用次数限制和月度费用，一次安装即可长期使用，对于高频使用场景的总拥有成本（TCO）远低于订阅制商业工具。

**安装与环境准备**
alexandria-audiobook — AI 语音合成工具中文文档依赖 Python 运行环境。建议通过 pyenv（Python）或 nvm（Node.js）管理 Python 版本，避免全局环境污染。对于新手用户，推荐先创建虚拟环境（python -m venv venv && source venv/bin/activate），再安装依赖，这样即使出现问题也可以随时删除虚拟环境重新开始，不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues（已关闭的问题），大多数常见问题都已有解答。遇到 Bug 时，提供 pip list 的输出、完整错误堆栈和最小可复现示例，能显著提高开发者响应速度。AI Skill Hub 将持续追踪 alexandria-audiobook — AI 语音合成工具中文文档的版本更新，及时通知重要功能变化。

📋 工具概览

alexandria-audiobook — AI 语音合成工具中文文档是一款基于 Python 开发的开源工具，专注于 ai、audiobook、audiobook-generator 等核心功能。作为 GitHub 开源项目，它拥有活跃的社区支持和持续的版本迭代，代码完全透明可审计，支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流，都能提供稳定可靠的解决方案。

GitHub Stars

⭐ 636

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

正常维护，社区驱动

开源协议

MIT

AI 综合评分

8.2 分

工具类型

AI工具

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

🎯 主要使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install alexandria-audiobook

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install alexandria-audiobook

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/Finrandojin/alexandria-audiobook
cd alexandria-audiobook
pip install -e .

# 验证安装
python -c "import alexandria_audiobook; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库页面
按照 README 文档完成依赖安装
根据系统环境完成初始化配置
参考官方示例或文档开始使用
遇到问题可在 GitHub Issues 中查找解答

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
alexandria-audiobook --help

# 基本用法
alexandria-audiobook input_file -o output_file

# Python 代码中调用
import alexandria_audiobook

# 示例
result = alexandria_audiobook.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# alexandria-audiobook 配置文件示例（config.yml）
app:
  name: "alexandria-audiobook"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
alexandria-audiobook --config config.yml

# 或通过环境变量配置
export ALEXANDRIA_AUDIOBOOK_API_KEY="your-key"
export ALEXANDRIA_AUDIOBOOK_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 95/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

简介

Preview a voice from text description

curl -X POST http://127.0.0.1:4200/api/voice_design/preview \ -H "Content-Type: application/json" \ -d '{"description": "A warm, deep male voice", "text": "Hello world."}'

Generate a dataset from Voice Designer description

curl -X POST http://127.0.0.1:4200/api/lora/generate_dataset \ -H "Content-Type: application/json" \ -d '{"name": "warm_voice", "description": "A warm male voice", "texts": ["Hello.", "Goodbye."]}'

Update project metadata (description and global seed)

curl -X POST http://127.0.0.1:4200/api/dataset_builder/update_meta \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male narrator", "global_seed": "42"}'

Features

Requirements

Pinokio
LLM server (one of the following):
LM Studio (local) - recommended: Qwen3 or similar
Ollama (local)
OpenAI API (cloud)
Any OpenAI-compatible API
GPU: 8 GB VRAM minimum, 16 GB+ recommended — see compatibility table below
Each TTS model uses ~3.4 GB; remaining VRAM determines batch size
CPU mode available on all platforms but significantly slower
RAM: 16 GB recommended (8 GB minimum)
Disk: ~20 GB (8 GB venv/PyTorch, ~7 GB for model weights, working space for audio)

Installation

Option B: Google Colab (No Install Required)

No GPU or wrong OS? Run Alexandria on a free T4 GPU in your browser:

Requires a free ngrok account for the web UI tunnel. See the notebook for full instructions.

Option C: Docker (NVIDIA GPU)

For integration into automated pipelines or server deployments:

git clone https://github.com/Finrandojin/alexandria-audiobook.git
cd alexandria-audiobook
docker compose up --build

Requires Docker with the NVIDIA Container Toolkit. The web UI is available at http://localhost:4200. TTS models download on first use and are cached in a Docker volume. User data (uploads, voice configs, trained LoRA adapters, audio output) persists via bind mounts to the project directory.

Setup Tab

Configure connections to your LLM and TTS engine.

TTS Settings: - Mode - local (built-in engine) or external (connect to Gradio server) - Device - auto (recommended), cuda, cpu, or mps - Language - TTS synthesis language: English (default), Chinese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, or Auto (let the model detect) - Parallel Workers - Batch size for fast batch rendering (higher = more VRAM usage) - Batch Seed - Fixed seed for reproducible batch output (leave empty for random) - Compile Codec - Enable torch.compile for 3-4x faster batch decoding (adds ~30-60s warmup on first generation) - Sub-batching - Split batches by text length to reduce wasted GPU compute on padding (enabled by default) - Min Sub-batch Size - Minimum chunks per sub-batch before allowing a split (default: 4) - Length Ratio - Maximum longest/shortest text length ratio before forcing a sub-batch split (default: 5) - Speaker Change Pause - Silence in milliseconds between different speakers during merge (default: 500) - Same Speaker Pause - Silence in milliseconds when the same speaker continues during merge (default: 250)

Prompt Settings (Advanced): - Generation Settings - Chunk size and max tokens for LLM responses - LLM Sampling Parameters - Temperature, Top P, Top K, Min P, and Presence Penalty - Banned Tokens - Comma-separated list of tokens to ban from LLM output (useful for disabling thinking mode on models like GLM4, DeepSeek-R1, etc.) - Prompt Customization - System and user prompts used for script generation. Defaults are loaded from default_prompts.txt and can be customized per-session in the UI. Click "Reset to Defaults" to reload the file-based defaults (picks up edits without restarting the app)

Dataset Builder Tab

Build LoRA training datasets interactively, one sample at a time.

Create a project with a voice description and optional global seed
Define samples — Set text and emotion/style per row
Preview audio — Generate and listen to individual samples or batch-generate all at once
Cancel batch — Stop a running batch generation without losing completed samples
Save as dataset — Export the project as a training-ready dataset that appears in the Training tab
Designed voices and Voice Designer descriptions drive the audio generation via Qwen3-TTS VoiceDesign model

Dataset Builder

```bash

List all dataset builder projects

curl http://127.0.0.1:4200/api/dataset_builder/list

Example: [sample.mp3](https://github.com/user-attachments/files/25276110/sample.mp3)

Quick Start

The interface is split into a 5-step core pipeline (green tabs, numbered) and advanced tools (blue tabs, unnumbered). You only need the core pipeline to produce an audiobook.

Screenshots

Update sample rows

curl -X POST http://127.0.0.1:4200/api/dataset_builder/update_rows \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "rows": [{"text": "Hello world.", "emotion": "cheerful"}]}'

Generate a single sample preview

curl -X POST http://127.0.0.1:4200/api/dataset_builder/generate_sample \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male voice", "sample_index": 0, "samples": [{"text": "Hello.", "emotion": "cheerful"}]}'

Batch generate all samples

curl -X POST http://127.0.0.1:4200/api/dataset_builder/generate_batch \ -H "Content-Type: application/json" \ -d '{"name": "my_voice_dataset", "description": "A warm male voice", "samples": [{"text": "Hello.", "emotion": "cheerful"}]}'

Export Options

Combined Audiobook - Single MP3 with all voices and natural pauses
Individual Voicelines - Separate MP3 per line for DAW editing (Audacity, etc.)
Audacity Export - One-click zip with per-speaker WAV tracks, LOF project file, and labels for automatic multi-track import into Audacity
M4B Audiobook - Chaptered M4B (AAC) with per-chunk or auto-detected chapter markers for audiobook players (Audiobookshelf, Apple Books, VLC, etc.)

Option A: Pinokio (Recommended)

1. Install Pinokio if you haven't already 2. Open Alexandria on Pinokio: Install via Pinokio - Or manually: in Pinokio, click Download and paste https://github.com/Finrandojin/alexandria-audiobook 3. Click Install to set up dependencies 4. Click Start to launch the web interface

Advanced Tools (Optional)

These tabs are for power users who want more control over voice creation:

Designer — Create new voices from text descriptions (e.g., "A warm elderly woman with a gentle raspy voice"). Save them to use as clone references in the Voices tab
Dataset — Build LoRA training datasets interactively, one sample at a time with audio preview
Training — Train LoRA adapters on voice datasets to create persistent voice identities that follow instruct directions

Recommended Settings for Batch Generation

Setting	Recommended	Notes
TTS Mode	`local`	Built-in engine, no external server
Compile Codec	`true`	3-4x faster decoding after one-time warmup
Parallel Workers	20-60	Higher = more throughput, more VRAM
Render Mode	Batch (Fast)	Uses batched TTS calls

Configuration

```bash

Get current config (empty prompts fall through to file defaults)

curl http://127.0.0.1:4200/api/config

Save config

curl -X POST http://127.0.0.1:4200/api/config \ -H "Content-Type: application/json" \ -d '{ "llm": {"base_url": "...", "api_key": "...", "model_name": "..."}, "tts": { "mode": "local", "device": "auto", "language": "English", "parallel_workers": 25, "batch_seed": 12345, "compile_codec": true, "sub_batch_enabled": true, "sub_batch_min_size": 4, "sub_batch_ratio": 5, "pause_between_speakers_ms": 500, "pause_same_speaker_ms": 250 } }' ```

Get voices and config

curl http://127.0.0.1:4200/api/voices

Save voice config

curl -X POST http://127.0.0.1:4200/api/save_voice_config \ -H "Content-Type: application/json" \ -d '{"NARRATOR": {"type": "custom", "voice": "Ryan", "character_style": "calm"}}' ```

Configure voices

voice_config = { "NARRATOR": {"type": "custom", "voice": "Ryan", "character_style": "calm narrator"}, "HERO": {"type": "custom", "voice": "Aiden", "character_style": "brave, determined"} } requests.post(f"{BASE}/api/save_voice_config", json=voice_config)

Web Interface

API Reference

Alexandria exposes a REST API for programmatic access:

... poll /api/status/audacity_export until not running ...

with open("audacity_export.zip", "wb") as f: f.write(requests.get(f"{BASE}/api/export_audacity").content) ```

AI-Powered Pipeline

Local & Cloud LLM Support - Use any OpenAI-compatible API (LM Studio, Ollama, OpenAI, etc.)
Automatic Script Annotation - LLM parses text into JSON with speakers, dialogue, and TTS instruct directions
LLM Script Review - Optional second LLM pass that fixes common annotation errors: strips attribution tags from dialogue, splits misattributed narration/dialogue, merges over-split narrator entries, and validates instruct fields
Smart Chunking - Groups consecutive lines by speaker (up to 500 chars) for natural flow
Context Preservation - Passes character roster and last 3 script entries between chunks for name and style continuity

Core Pipeline

Step 1 — Setup Configure your LLM connection and TTS engine. At minimum you need: - LLM Base URL: http://localhost:1234/v1 (LM Studio) or http://localhost:11434/v1 (Ollama) - LLM API Key: Your API key (use local for local servers) - LLM Model Name: The model to use (e.g., qwen2.5-14b) - TTS Mode: local (built-in, recommended) — loads models directly, no external server needed - Click Save Configuration when done

Step 2 — Script - Select your book file (.txt, .md, or .epub) using the file picker — it uploads automatically - Click Generate Annotated Script — this sends the book to your LLM to split it into annotated chunks with speaker labels and voice directions - (Optional) Click Review Script if the generated script has issues — this runs a second LLM pass to fix speaker misattributions or formatting problems - You can save the script for later use with the Save feature below

Step 3 — Voices Each character detected in the script gets a voice card. For each speaker: - Choose a voice type: Custom Voice (easiest), Clone Voice, LoRA Voice, or Voice Design - For Custom Voice, pick from 9 presets (Ryan, Serena, Aiden, etc.) and optionally set a character style (e.g., "Heavy Scottish accent") - Changes save automatically — see Voice Types for guidance on each type

Step 4 — Editor - Click Render Pending to generate audio for all chunks in batch - Listen to individual chunks or click Play Sequence to preview in order - Edit any chunk's text, speaker, or instruct inline and regenerate it individually - When satisfied, click Merge All to combine everything into the final audiobook

Step 5 — Result - Listen to the finished audiobook in the browser - Download as MP3, or click Export to Audacity for per-speaker WAV tracks

Python Integration

```python import requests

BASE = "http://127.0.0.1:4200"

JavaScript Integration

const BASE = "http://127.0.0.1:4200";

// Upload file
const formData = new FormData();
formData.append("file", fileInput.files[0]);
await fetch(`${BASE}/api/upload`, { method: "POST", body: formData });

// Generate script
await fetch(`${BASE}/api/generate_script`, { method: "POST" });

// Poll for completion
async function waitForTask(taskName) {
  while (true) {
    const res = await fetch(`${BASE}/api/status/${taskName}`);
    const data = await res.json();
    if (data.status === "completed" || data.status === "error") return data;
    await new Promise(r => setTimeout(r, 2000));
  }
}
await waitForTask("script_generation");

// Configure and generate
await fetch(`${BASE}/api/save_voice_config`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    NARRATOR: { type: "custom", voice: "Ryan", character_style: "calm" }
  })
});

// Fast batch render all chunks
const chunks = await (await fetch(`${BASE}/api/chunks`)).json();
const indices = chunks.map(c => c.id);
await fetch(`${BASE}/api/generate_batch_fast`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ indices })
});
// ... poll until all chunks done ...

// Merge into final audiobook
await fetch(`${BASE}/api/merge`, { method: "POST" });

// Export to Audacity
await fetch(`${BASE}/api/export_audacity`, { method: "POST" });
// ... poll /api/status/audacity_export until not running ...
// Download zip from GET /api/export_audacity

Troubleshooting

📚 实用指南（长尾问题）

适合谁

做语音类 AI 产品的开发者

最佳实践

生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
本地部署优先选 GGUF 量化模型，节省显存并保持响应速度

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
容器内无法访问宿主机 localhost — 使用 host.docker.internal
显存不足直接 OOM — 优先降低 context 或换更小的量化模型
Python 依赖冲突：建议用 venv / uv 隔离环境

部署方案

Docker：alexandria-audiobook 提供官方镜像，docker compose up 一键启动
CLI：直接 npm install -g / pip install，命令行调用
本地部署：CPU 8GB 起，GPU 推荐 16GB+ 显存
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

原始名称	`alexandria-audiobook`
原始描述	AI-powered multi-voice audiobook generator — LLM script annotation, voice cloning, voice design, LoRA training, per-line style control, and export to MP3, chaptered M4B, or Audacity multi-track. Built on Qwen3-TTS.
Topics	`aiaudiobookaudiobook-generatoraudiobookshelfchapter-markersdialogue-generationtts`
GitHub	https://github.com/Finrandojin/alexandria-audiobook
License	MIT
语言	Python