openai-edge-tts — AI 语音合成工具中文文档 是 AI Skill Hub 本期精选AI工具之一。已获得 1.9k 颗 GitHub Star,综合评分 8.6 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
openai-edge-tts — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、azure、chatgpt 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
openai-edge-tts — AI 语音合成工具中文文档 是一款基于 Python 开发的开源工具,专注于 ai、azure、chatgpt 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install openai-edge-tts
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install openai-edge-tts
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/travisvn/openai-edge-tts
cd openai-edge-tts
pip install -e .
# 验证安装
python -c "import openai_edge_tts; print('安装成功')"
# 命令行使用
openai-edge-tts --help
# 基本用法
openai-edge-tts input_file -o output_file
# Python 代码中调用
import openai_edge_tts
# 示例
result = openai_edge_tts.process("input")
print(result)
# openai-edge-tts 配置文件示例(config.yml) app: name: "openai-edge-tts" debug: false log_level: "INFO" # 运行时指定配置文件 openai-edge-tts --config config.yml # 或通过环境变量配置 export OPENAI_EDGE_TTS_API_KEY="your-key" export OPENAI_EDGE_TTS_OUTPUT_DIR="./output"
/v1/audio/speech with similar request structure and behavior.stream_format: "sse" is specified.edge-tts equivalents.requirements.txt.Use pip to install the required packages listed in requirements.txt:
pip install -r requirements.txt
git clone https://github.com/travisvn/openai-edge-tts.git
cd openai-edge-tts
.env file in the root directory with the following variables:API_KEY=your_api_key_here
PORT=5050
DEFAULT_VOICE=en-US-AvaNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.0
DEFAULT_LANGUAGE=en-US
REQUIRE_API_KEY=True
REMOVE_FILTER=False
EXPAND_API=True
DETAILED_ERROR_LOGGING=True
Or, copy the default .env.example with the following:
cp .env.example .env
docker compose up --build
Run with -d to run docker compose in "detached mode", meaning it will run in the background and free up your terminal.
docker compose up -d
<details> <summary>
</summary>
By default, docker compose up --build creates a minimal image without ffmpeg. If you're building locally (after cloning this repository) and need ffmpeg for audio format conversions (beyond MP3), you can include it in the build.
This is controlled by the INSTALL_FFMPEG_ARG build argument. Set this environment variable to true in one of these ways:
1. Prefixing the command:
INSTALL_FFMPEG_ARG=true docker compose up --build
2. Adding to your .env file: Add this line to the .env file in the project root: INSTALL_FFMPEG_ARG=true
Then, run docker compose up --build. 3. Exporting in your shell environment: Add export INSTALL_FFMPEG_ARG=true to your shell configuration (e.g., ~/.zshrc, ~/.bashrc) and reload your shell. Then docker compose up --build will use it.
This is for local builds. For pre-built Docker Hub images, add the latest-ffmpeg tag to the version
docker run -d -p 5050:5050 -e API_KEY=your_api_key_here -e PORT=5050 travisvn/openai-edge-tts:latest-ffmpeg
---
</details>
Alternatively, run directly with Docker:
docker build -t openai-edge-tts .
docker run -p 5050:5050 --env-file .env openai-edge-tts
To run the container in the background, add -d after the docker run command:
docker run -d -p 5050:5050 --env-file .env openai-edge-tts
http://localhost:5050.<details> <summary>
The simplest way to get started without having to configure anything is to run the command below
docker run -d -p 5050:5050 travisvn/openai-edge-tts:latest
This will run the service at port 5050 with all the default configs
(Docker required, obviously)
</summary>
/v1/audio/speechGenerates audio from the input text. Available parameters:
Required Parameter:
Optional Parameters:
"tts-1").edge-tts voice (default: "en-US-AvaNeural").mp3, opus, aac, flac, wav, pcm (default: mp3).1.0."audio" (raw audio data, default) or "sse" (Server-Sent Events streaming with JSON events).Note: The API is fully compatible with OpenAI's TTS API specification. The instructions parameter (for fine-tuning voice characteristics) is not currently supported, but all other parameters work identically to OpenAI's implementation.
Example request with curl and saving the output to an mp3 file:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
"voice": "echo",
"response_format": "mp3",
"speed": 1.1
}' \
--output speech.mp3
You can pipe the audio directly to ffplay for immediate playback, just like OpenAI's API:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Authorization: Bearer your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Today is a wonderful day to build something people love!",
"voice": "alloy",
"response_format": "mp3"
}' | ffplay -i -
Or for immediate playback without saving to file:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Authorization: Bearer your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"input": "This will play immediately without saving to disk!",
"voice": "shimmer"
}' | ffplay -autoexit -nodisp -i -
Or, to be in line with the OpenAI API endpoint parameters:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
"voice": "alloy"
}' \
--output speech.mp3
For applications that need structured streaming events (like web applications), use SSE format:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "This will stream as Server-Sent Events with JSON data containing base64-encoded audio chunks.",
"voice": "alloy",
"stream_format": "sse"
}'
SSE Response Format:
data: {"type": "speech.audio.delta", "audio": "base64-encoded-audio-chunk"}
data: {"type": "speech.audio.delta", "audio": "base64-encoded-audio-chunk"}
data: {"type": "speech.audio.done", "usage": {"input_tokens": 12, "output_tokens": 0, "total_tokens": 12}}
Example using fetch API for SSE streaming:
async function streamTTSWithSSE(text) {
const response = await fetch('http://localhost:5050/v1/audio/speech', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer your_api_key_here',
},
body: JSON.stringify({
input: text,
voice: 'alloy',
stream_format: 'sse',
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
const audioChunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.type === 'speech.audio.delta') {
// Decode base64 audio chunk
const audioData = atob(data.audio);
const audioArray = new Uint8Array(audioData.length);
for (let i = 0; i < audioData.length; i++) {
audioArray[i] = audioData.charCodeAt(i);
}
audioChunks.push(audioArray);
} else if (data.type === 'speech.audio.done') {
console.log('Speech synthesis complete:', data.usage);
// Combine all chunks and play
const totalLength = audioChunks.reduce(
(sum, chunk) => sum + chunk.length,
0
);
const combinedArray = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of audioChunks) {
combinedArray.set(chunk, offset);
offset += chunk.length;
}
const audioBlob = new Blob([combinedArray], { type: 'audio/mpeg' });
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
return;
}
}
}
}
}
// Usage
streamTTSWithSSE('Hello from SSE streaming!');
And an example of a language other than English:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "じゃあ、行く。電車の時間、調べておくよ。",
"voice": "ja-JP-KeitaNeural"
}' \
--output speech.mp3
Example using fetch API for SSE streaming:
async function streamTTSWithSSE(text) {
const response = await fetch('http://localhost:5050/v1/audio/speech', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer your_api_key_here',
},
body: JSON.stringify({
input: text,
voice: 'alloy',
stream_format: 'sse',
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
const audioChunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.type === 'speech.audio.delta') {
// Decode base64 audio chunk
const audioData = atob(data.audio);
const audioArray = new Uint8Array(audioData.length);
for (let i = 0; i < audioData.length; i++) {
audioArray[i] = audioData.charCodeAt(i);
}
audioChunks.push(audioArray);
} else if (data.type === 'speech.audio.done') {
console.log('Speech synthesis complete:', data.usage);
// Combine all chunks and play
const totalLength = audioChunks.reduce(
(sum, chunk) => sum + chunk.length,
0
);
const combinedArray = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of audioChunks) {
combinedArray.set(chunk, offset);
offset += chunk.length;
}
const audioBlob = new Blob([combinedArray], { type: 'audio/mpeg' });
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
return;
}
}
}
}
}
// Usage
streamTTSWithSSE('Hello from SSE streaming!');
edge-tts voices for a given language / locale.edge-tts voices, with language support information.</details>
[!TIP] Swaplocalhostto your local IP (ex.192.168.0.1) if you have issues It may be the case that, when accessing this endpoint on a different server / computer or when the call is made from another source (like Open WebUI), you need to change the URL fromlocalhostto your local IP (something like192.168.0.1or similar)
Create and activate a virtual environment to isolate dependencies:
```bash
Create a .env file in the root directory and set the following variables:
API_KEY=your_api_key_here
PORT=5050
DEFAULT_VOICE=en-US-AvaNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.0
DEFAULT_LANGUAGE=en-US
REQUIRE_API_KEY=True
REMOVE_FILTER=False
EXPAND_API=True
DETAILED_ERROR_LOGGING=True
This project provides a local, OpenAI-compatible text-to-speech (TTS) API using edge-tts. It emulates the OpenAI TTS endpoint (/v1/audio/speech), enabling users to generate speech from text with various voice options and playback speeds, just like the OpenAI API.
edge-tts uses Microsoft Edge's online text-to-speech service, so it is completely free.
You can now interact with the API at http://localhost:5050/v1/audio/speech and other available endpoints. See the Usage section for request examples.
</details>
<details> <summary>
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
⚠️ GPL 3.0 — 强 Copyleft,衍生作品须开源,含专利保护条款,不可闭源使用。
经综合评估,openai-edge-tts — AI 语音合成工具中文文档 在AI工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | openai-edge-tts |
| 原始描述 | Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs |
| Topics | aiazurechatgptclaudeedge-ttselevenlabstts |
| GitHub | https://github.com/travisvn/openai-edge-tts |
| License | GPL-3.0 |
| 语言 | Python |
收录时间:2026-05-22 · 更新时间:2026-05-22 · License:GPL-3.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。