📄 工具详情 ⚙️ 安装教程 📚 使用教程

能力标签

🤖 Agent 🔄 工作流 🐳 Docker 🔗 REST API 🔊 TTS 🎙 STT 🧠 Claude

⚙️

Agent工作流

VoiceBlender 语音控制平台

Q: voiceblender 如何安装和开始使用？

访问 voiceblender 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

Q: voiceblender 是否免费？许可证是什么？

voiceblender 完全免费，采用 MIT 许可证开源发布，任何人都可以免费使用、修改和分发。

Q: voiceblender 适合哪些用户使用？

voiceblender 主要面向有一定技术基础的用户，包括开发者、数据分析师、AI 工程师等专业人士。

Q: voiceblender 的社区活跃度和项目维护状况如何？

voiceblender 在 GitHub 上已获得 68 个 Star，处于积极发展阶段，社区在持续扩大。

基于 Go · 无代码搭建完整 AI 自动化流程

英文名：voiceblender

⭐ 68 Stars 🍴 8 Forks 💻 Go 📄 MIT 🏷 AI 8.2分

8.2AI 综合评分

语音AIWebRTC实时通信

⬇ 下载源码 ZIP ⚙️ 配置说明 📺 TG 频道

✦ AI Skill Hub 推荐

AI Skill Hub 强烈推荐：VoiceBlender 语音控制平台是一款优质的Agent工作流。AI 综合评分 8.2 分，在同类工具中表现稳健。如果你正在寻找可靠的Agent工作流解决方案，这是一个值得深入了解的选择。

📚 深度解析

VoiceBlender 语音控制平台是一套完整的 AI Agent 自动化工作流方案。随着 AI 能力的不断提升，基于 Agent 的自动化工作流正在成为提升个人和团队效率的核心方式。区别于传统的 RPA 自动化（模拟鼠标键盘操作），AI Agent 工作流通过理解任务意图、动态规划执行路径，能够处理更复杂的非结构化任务。

VoiceBlender 语音控制平台工作流的设计遵循"最小配置，最大复用"原则：核心逻辑已经封装好，用户只需配置自己的 API Key 和业务参数即可快速上手。工作流内置错误处理和重试机制，在网络波动或 API 限速等情况下仍能稳定运行，适合作为生产环境的自动化基础设施。

在实际部署时，建议先在测试环境中运行 3-5 次，验证各个环节的输出结果符合预期，再部署到生产环境。AI Skill Hub 评分 8.2 分，是同类 Agent 工作流中的精选推荐。

📋 工具概览

一个可编程的开源语音平台，支持SIP和WebRTC通话控制及多方音频混音。它集成了ASR和音频处理能力，旨在为AI Agent提供实时语音交互基础设施，适合需要构建复杂语音工作流的开发者。

VoiceBlender 语音控制平台是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排，将复杂的多步骤任务拆解为清晰的自动化流程，实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成，适合构建数据处理管线、业务自动化和 AI 辅助决策系统。

GitHub Stars

⭐ 68

开发语言

支持平台

Windows / macOS / Linux（跨平台）

维护状态

轻量级项目，按需更新

开源协议

MIT

AI 综合评分

8.2 分

工具类型

Agent工作流

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

可视化 Agent 工作流编排，无需编写复杂代码
支持多步骤自动化任务链，实现全流程无人值守
与外部 API、数据库和第三方服务无缝集成
内置错误处理与自动重试机制，保障稳定运行
提供可复用的自动化模板，快速在同类场景部署

🎯 主要使用场景

自动化日常重复性工作，将精力集中于创造性任务
构建数据采集 → 处理 → 输出的完整自动化管线
实现跨平台、跨系统的数据流转和业务协同

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：go install（推荐）
go install github.com/VoiceBlender/voiceblender@latest

# 方式二：从源码编译
git clone https://github.com/VoiceBlender/voiceblender
cd voiceblender
go build -o voiceblender .

# 方式三：下载预编译二进制
# 访问 Releases 页面下载对应平台二进制文件
# https://github.com/VoiceBlender/voiceblender/releases

📋 安装步骤说明

访问 GitHub 仓库获取工作流文件
在对应平台（Dify / Flowise / Make 等）中找到「导入工作流」功能
上传工作流文件
按照提示配置必要的环境变量和 API Key
运行测试确认流程正常后投入使用

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 查看帮助
voiceblender --help

# 基本运行
voiceblender [options] <input>

# 详细使用说明请查阅文档
# https://github.com/VoiceBlender/voiceblender

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# voiceblender 配置说明
# 查看配置选项
voiceblender --config-example > config.yml

# 常见配置项
# output_dir: ./output
# log_level: info
# workers: 4

# 环境变量（覆盖配置文件）
export VOICEBLENDER_CONFIG="/path/to/config.yml"

📑 README 深度解析真实文档完整度 90/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

VoiceBlender

A Go service that bridges SIP and WebRTC voice calls with multi-party audio mixing, a REST API, and real-time webhooks.

Features

SIP inbound & outbound -- receive and originate SIP calls with codec negotiation (PCMU, PCMA, G.722, Opus, AMR-WB, AMR-NB), digest auth, session timers (RFC 4028)
SIP over TLS -- optional TLS transport on a second port alongside UDP, reusable by classic SIP trunks and required by WhatsApp
Early media -- SIP 183 Session Progress with SDP for pre-answer audio (custom ringback, IVR)
Hold/unhold -- SIP re-INVITE with sendonly/sendrecv direction
WebRTC -- browser-based voice via SDP offer/answer with trickle ICE
WhatsApp Business Calling -- inbound and outbound calls over SIP-TLS + ICE/DTLS-SRTP + Opus
WebSocket legs -- inbound (HTTP upgrade) and outbound (dial) PCM-over-WebSocket legs with binary or json_base64 framing, configurable sample rate (8/16/24/48 kHz), bidirectional text, and caller-supplied X-/P- headers — designed to also back a future generic Agent API
MoQ legs (experimental, PoC) -- inbound Media-over-QUIC legs over WebTransport/HTTP/3 with Opus framed one frame per MoQ Object (LOC-style). Tracks mengelbart/moqtransport (IETF draft-11); browser interop with draft-16 clients (moqtail, moq.dev) is not expected to work out of the box. Disabled by default; enable with MOQ_ENABLED=true + MOQ_TLS_CERT_FILE / MOQ_TLS_KEY_FILE
Multi-party rooms -- mix N participants with mixed-minus-self audio at a configurable sample rate (8 kHz, 16 kHz, or 48 kHz per room; default 16 kHz)
Room bridging -- join two rooms' mixers (same sample rate) with live-configurable direction (bidirectional, one-way each way, or parked); echo-free via mixed-minus-self
Audio routing matrix -- per-room role-based routing for asymmetric audio (barge-in / whisper / supervisor monitor). Tag legs with a free-form role and declare a matrix of who-hears-whom by role. Applied atomically at leg-join time so a supervisor cannot momentarily bleed into the customer's audio. See API.md.
WebSocket room access -- join rooms from any client over a WebSocket with base64 PCM frames
DTMF -- send and receive RFC 4733 telephone-events
Real-Time Text (RTT) -- ITU-T T.140 over RTP per RFC 4103 with RFC 2198 redundancy;
Recording -- stereo WAV recording per-leg or per-room, multi-channel per-participant tracks, pause/resume (writes silence to preserve timeline while sensitive data is exchanged), optional S3 upload
Playback -- stream WAV/MP3 audio or built-in telephone tones into legs or rooms
TTS -- text-to-speech into legs or rooms (ElevenLabs, Google Cloud, AWS Polly)
STT -- real-time speech-to-text with partial transcripts (ElevenLabs)
AI Agent -- attach a conversational AI agent to a leg or room (ElevenLabs, VAPI, Pipecat, Deepgram) with mid-session context injection
Answering Machine Detection (AMD) -- per-call analysis of outbound call audio to classify the answerer as human, machine, no-speech, or not-sure; optional voicemail beep detection via Goertzel frequency analysis
Webhooks -- real-time event delivery with HMAC-SHA256 signing and retry; typed event data with CDR-style leg.disconnected (disposition, timing, quality)
WebSocket event stream (VSI) -- GET /v1/vsi streams all events and accepts commands (mute, hold, DTMF, room management) over a single persistent WebSocket; filter by app_id regex for multi-tenant isolation
Prometheus metrics -- operational metrics exposed at GET /metrics (active legs/rooms, call durations, disconnect reasons, Go runtime). See API.md for the full metric reference. Profiling via go tool pprof is available at /debug/pprof/ when built with -tags pprof.

Capabilities

Inbound — Meta-originated INVITEs are auto-routed to a WhatsApp handler when the From URI host ends in meta.vc. The leg comes up in ringing, fires leg.ringing (leg_type: "whatsapp_in"), and waits for POST /v1/legs/{id}/answer. The 200 OK then carries the pre-gathered ICE/DTLS-SRTP answer.
Outbound — POST /v1/legs {"type":"whatsapp", ...} returns 201 immediately with the leg in ringing. ICE gathering, the digest 401/407 round-trip, and the SDP-answer apply happen asynchronously; outcome is signalled via leg.connected or leg.disconnected.
Audio — full-duplex Opus at 48 kHz with mixed-minus-self room participation, recording, TTS, STT, agent attachment, speaking detection, playback. The mixer auto-resamples between WhatsApp's 48 kHz and your room's configured rate.
DTMF — inbound RFC 4733 telephone-events are decoded and emitted as dtmf.received plus the standard cross-leg broadcast.
Webhooks + WebSocket events — leg.ringing / leg.connected / leg.disconnected / dtmf.received / speaking.started / speaking.stopped all carry leg_type set to whatsapp_in or whatsapp_out so multi-tenant filtering works as it does for SIP and WebRTC legs.

Integration tests (requires two SIP instances)

go test -tags integration -v -timeout 60s ./tests/integration/

Dependencies

Library	Description	Notes
[sipgo](https://github.com/emiago/sipgo)	SIP stack	Excellent SIP stack in go
[pion/webrtc](https://github.com/pion/webrtc)	WebRTC	Nothing is better than Pion
[go-chi](https://github.com/go-chi/chi)	HTTP router
[zaf/g711](https://github.com/zaf/g711)	G.711 codec
[gobwas/ws](https://github.com/gobwas/ws)	WebSocket
[go-audio/wav](https://github.com/go-audio/wav)	WAV encoding
[gopus](https://github.com/thesyncim/gopus)	Opus codec	Thanks Marcelo! (Claude and Codex too!)
[go-mp3](https://github.com/hajimehoshi/go-mp3)	MP3 decoder	Pure Go
[go-audio/audio](https://github.com/go-audio/audio)	Audio buffer types
[google/uuid](https://github.com/google/uuid)	UUID generation
[prometheus/client_golang](https://github.com/prometheus/client_golang)	Prometheus metrics
[aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2)	AWS SDK (S3, Polly)
[cloud.google.com/go/texttospeech](https://cloud.google.com/go/docs/reference/cloud.google.com/go/texttospeech/latest)	Google Cloud TTS
[protobuf](https://github.com/protocolbuffers/protobuf-go)	Protocol Buffers	Pipecat agent
[x/sync](https://pkg.go.dev/golang.org/x/sync)	Concurrency utilities

Build and run

go build -o voiceblender ./cmd/voiceblender ./voiceblender

Quick Start

```bash

Examples

Example	Description
[`examples/call_handler.py`](examples/call_handler.py)	Python webhook listener for inbound SIP calls with room conferencing
[`examples/webrtc-client/`](examples/webrtc-client/)	Browser-based WebRTC voice client with room management and DTMF
[`examples/gen_test_wav.py`](examples/gen_test_wav.py)	Generate test WAV files for playback testing

Configuration

All configuration is via environment variables:

Variable	Default	Description
`INSTANCE_ID`	(auto-generated UUID)	Instance identifier, included in API responses and webhooks
`HTTP_ADDR`	`:8080`	REST API listen address
`ALLOWED_IPS`	_(empty = allow all)_	Comma-separated allowlist of IPs and CIDR ranges (IPv4 and IPv6, in any mix) gating every HTTP endpoint, including the `/v1/vsi` event WebSocket, `/v1/legs/websocket`, the `/v1/legs/moq` WebTransport endpoint, `/metrics`, and pprof. Empty or unset disables the check. Whitespace around entries is trimmed; bare addresses are treated as host routes (`/32` for v4, `/128` for v6); malformed entries fail server startup. Only `X-Forwarded-For` is consulted as a proxy header (see `TRUST_PROXY_HEADERS`); `X-Real-IP` and RFC 7239 `Forwarded` are ignored. Examples: `127.0.0.1`, `10.0.0.0/8,192.168.0.0/16`, `2001:db8::/32,::1`.
`TRUST_PROXY_HEADERS`	`false`	When `true`, the client IP used for the `ALLOWED_IPS` check is taken from the leftmost entry in `X-Forwarded-For` (falling back to the socket peer when the header is absent). When `false` (default), `X-Forwarded-For` is ignored and only the socket peer (`r.RemoteAddr`) is consulted. Enable only when VoiceBlender sits behind a trusted reverse proxy that unconditionally overwrites `X-Forwarded-For` — otherwise the header is client-spoofable.
`SIP_BIND_IP`	`127.0.0.1`	IPv4 address advertised in SDP/Contact/Via headers (and used as the listen address when `SIP_LISTEN_IP` is empty). Set to `0.0.0.0` for v4 wildcard, `::` for dual-stack on Linux when `bindv6only=0`.
`SIP_LISTEN_IP`	(same as SIP_BIND_IP)	UDP socket bind IP. Accepts `127.0.0.1`, `0.0.0.0`, `::`, or any literal v4/v6 address.
`SIP_BIND_IPV6`	(empty = v4-only)	IPv6 address advertised in SDP/Contact/Via for IPv6 calls. Set this for IPv6-only or dual-stack deployments.
`SIP_LISTEN_IPV6`	(same as SIP_BIND_IPV6)	Optional separate IPv6 socket bind address (e.g. when running with both `0.0.0.0` and a specific v6 literal).
`SIP_PORT`	`5060`	SIP listen port (UDP)
`SIP_TLS_PORT`	(disabled)	SIP-over-TLS listen port (typically `5061`). When set, `SIP_TLS_CERT` and `SIP_TLS_KEY` must also be provided. Required for WhatsApp Business Calling integration.
`SIP_TLS_CERT`		Path to PEM-encoded TLS certificate (e.g. `fullchain.pem`). Meta rejects self-signed certs — use a CA-signed cert matching a public FQDN.
`SIP_TLS_KEY`		Path to PEM-encoded TLS private key (e.g. `privkey.pem`).
`SIP_DEBUG`	`false`	When `true`, log the full RFC 3261 wire form of every inbound and outbound SIP request and response. Very verbose — use only for troubleshooting.
`SIP_DOMAIN`	(falls back to advertised IP)	FQDN advertised in From, Contact and Via on all outbound SIP signalling (classic trunks and WhatsApp). Should match the SAN on `SIP_TLS_CERT` and any allowlist your carrier or Meta keeps.
`SIP_HOST`	`voiceblender`	SIP User-Agent name
`ICE_SERVERS`	`stun:stun.l.google.com:19302`	STUN/TURN URLs (comma-separated)
`WEBRTC_EXTERNAL_IPS`	(empty)	Comma-separated public IPs advertised as host ICE candidates (pion `SetNAT1To1IPs`). Set this when VoiceBlender runs behind NAT/Docker and the gathered host interface IPs aren't routable from the remote peer — otherwise WebRTC peers behind firewalls won't be able to reach VB. Supports IPv4 and IPv6 literals. The literal value `auto` performs STUN-based public-IP discovery at startup against the first reachable `ICE_SERVERS` entry; discovery failure is non-fatal and logs a warning.
`RECORDING_DIR`	`/tmp/recordings`	Local recording output directory
`LOG_LEVEL`	`info`	Log level (`debug`, `info`, `warn`, `error`)
`WEBHOOK_URL`		Default webhook URL for inbound calls
`WEBHOOK_SECRET`		HMAC-SHA256 signing secret for the global webhook. Applied to events delivered to `WEBHOOK_URL`; per-leg/per-room webhooks set via the API can supply their own secret.
`ELEVENLABS_API_KEY`		API key for ElevenLabs TTS, STT, and Agent
`VAPI_API_KEY`		API key for VAPI Agent provider
`DEEPGRAM_API_KEY`		API key for Deepgram STT and TTS
`AZURE_SPEECH_KEY`		Subscription key for Azure Cognitive Speech Services (TTS and STT)
`AZURE_SPEECH_REGION`	`eastus`	Azure region for Speech Services (e.g. `eastus`, `westeurope`)
`S3_BUCKET`		S3 bucket for recording uploads
`S3_REGION`	`us-east-1`	AWS region
`S3_ENDPOINT`		Custom S3 endpoint (MinIO, etc.)
`S3_PREFIX`		Key prefix for S3 objects
`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, `AWS_PROFILE`		AWS credentials for S3 uploads and AWS Polly TTS. Resolved by the AWS SDK's default credential chain (env vars → `~/.aws/credentials` → EC2/ECS/EKS instance role), not by VoiceBlender directly. `AWS_REGION` is honored only when `S3_REGION` is empty.
`GOOGLE_APPLICATION_CREDENTIALS`		Path to a Google Cloud service-account JSON file used by Google Cloud TTS when no per-request `api_key` is supplied. Resolved by Google's Application Default Credentials chain (env var → `~/.config/gcloud/application_default_credentials.json` → GCE/Cloud Run/GKE metadata), not by VoiceBlender directly.
`TTS_CACHE_ENABLED`	`false`	Enable disk-backed TTS audio cache. Cached audio persists across restarts.
`TTS_CACHE_DIR`	`/tmp/tts_cache`	Directory for cached TTS audio files (used when `TTS_CACHE_ENABLED=true`)
`TTS_CACHE_INCLUDE_API_KEY`	`false`	Include API key in TTS cache key (set `true` if different keys map to different voice clones)
`RTP_PORT_MIN`	`10000`	Minimum UDP port for RTP/RTCP media
`RTP_PORT_MAX`	`20000`	Maximum UDP port for RTP/RTCP media
`SIP_JITTER_BUFFER_MS`	`0`	SIP ingress jitter buffer target delay in ms (0 = disabled passthrough). Applies to every SIP leg.
`SIP_JITTER_BUFFER_MAX_MS`	`300`	Max depth of the SIP ingress jitter buffer (ms); frames beyond this are dropped oldest-first.
`SIP_EXTERNAL_IP`	(empty)	Public IPv4 address for NAT/Docker deployments. When set, used in SIP Contact headers and SDP media (c=) lines instead of the auto-detected or bind IP. IPv6 has no equivalent: set `SIP_BIND_IPV6` directly to the address you want advertised.
`DEFAULT_SAMPLE_RATE`	`16000`	Default mixer sample rate (Hz) for new rooms when `sample_rate` is not specified. Allowed: `8000`, `16000`, `48000`.
`SIP_CODECS`	`PCMU,PCMA`	Comma-separated, preference-ordered list of codecs the SIP engine offers on outbound INVITEs and accepts on inbound INVITEs (a codec absent from this list cannot be negotiated in either direction). Recognized names (case-insensitive): `PCMU`, `PCMA`, `G722`, `opus`, `AMR-WB`, `AMR-NB` (the bare token `AMR` also resolves to AMR-NB per RFC 4867 §8.1). Unknown names and duplicates are dropped silently; if the parsed list ends up empty the default is used. Example: `SIP_CODECS=opus,G722,PCMU,PCMA,AMR-WB,AMR-NB` enables every supported codec, ranked Opus-first.
`SIP_REFER_AUTO_DIAL`	`false`	Accept incoming SIP REFER requests and auto-dial the transferred call. Default-deny (toll-fraud risk). Outbound transfers via the REST API are unaffected.
`SIP_AUTO_RINGING`	`false`	Behavior change vs prior releases: previously the server always sent `180 Ringing` after `100 Trying`. The new default sends only `100 Trying`; the API caller drives ringing explicitly via `POST /v1/legs/{id}/ring`, `/early-media`, or `/answer`. Set to `true` to restore the legacy auto-180 behavior.
`SIP_USE_SOURCE_SOCKET`	`false`	When `true`, route SIP responses and in-dialog requests (BYE, re-INVITE, UPDATE, INFO, NOTIFY, REFER) back to the request's source UDP socket instead of the peer's `Contact` URI / Via sent-by. Enable when peers advertise unroutable addresses (e.g. private IPs in `Contact` from behind NAT, or Via sent-by hosts that don't resolve). Equivalent to sipgo's `DialogUA.RewriteContact` plus per-response `SetDestination(req.Source())`.
`SIP_REGISTRATION_DEFAULT_EXPIRES_SECONDS`	`3600`	Expiry used when an inbound `REGISTER` carries no `Expires` value.
`SIP_REGISTRATION_MAX_EXPIRES_SECONDS`	`7200`	Upper clamp on the granted expiry. Requests above this value are honored at this maximum.
`SIP_REGISTRATION_SWEEP_INTERVAL_MS`	`1000`	Sweeper period for evicting expired AOR bindings.
`SIP_REGISTRATION_ALLOW_MULTIPLE_CONTACTS`	`true`	When `true`, the same AOR may be bound from multiple Contacts simultaneously (and `POST /v1/legs` parallel-forks to every bound contact). When `false`, each `REGISTER` replaces any prior Contacts for the AOR.
`SIP_OUTBOUND_REGISTRATION_DEFAULT_EXPIRES_SECONDS`	`3600`	Default `Expires` value sent on outbound REGISTER (sip_register trunks) when the create-trunk request does not specify one.
`SIP_OUTBOUND_REGISTRATION_MIN_EXPIRES_SECONDS`	`60`	Lower clamp on the requested outbound REGISTER expiry.
`SIP_OUTBOUND_REGISTRATION_MAX_EXPIRES_SECONDS`	`7200`	Upper clamp on the requested outbound REGISTER expiry.
`SIP_OUTBOUND_REGISTRATION_REFRESH_RATIO`	`0.5`	Fraction of the granted expiry at which the trunk refreshes (e.g. `0.5` of a 600 s grant → refresh every 300 s). Must be `(0, 1)`; out-of-range values fall back to `0.5`.
`SIP_OUTBOUND_REGISTRATION_FAILURE_BACKOFF_MAX_MS`	`300000`	Upper cap on the exponential backoff between failed outbound REGISTER attempts. Failures emit `sip.outbound_registration_failed`; the trunk stays in the manager and keeps retrying.
`SPEECH_DETECTION_ENABLED`	`false`	Emit `speaking.started` / `speaking.stopped` events for every connected leg by default. Per-call `speech_detection` on `POST /v1/legs` or `POST /v1/legs/{id}/answer` overrides this.
`AMRWB_MODE`	`2`	AMR-WB (G.722.2) encoder speech-mode ceiling `0..8`: `0`=6.60, `1`=8.85, `2`=12.65, `3`=14.25, `4`=15.85, `5`=18.25, `6`=19.85, `7`=23.05, `8`=23.85 kbit/s. The actual transmit mode is this ceiling clamped to the peer's negotiated `mode-set` (so e.g. `8` yields HD 23.85 only when the peer allows it, falling back automatically). Default `2` (12.65) matches the GSMA IR.92 / VoLTE common rate. Out-of-range values clamp to `0..8`.
`AMRWB_OCTET_ALIGNED`	`true`	Offer octet-aligned AMR-WB framing (RFC 4867) in outbound SDP. When `false`, offers bandwidth-efficient framing. On answers, VoiceBlender always echoes the framing the peer negotiated.
`AMRNB_MODE`	`7`	AMR-NB (RFC 4867) encoder speech-mode ceiling `0..7`: `0`=4.75, `1`=5.15, `2`=5.90, `3`=6.70, `4`=7.40, `5`=7.95, `6`=10.2, `7`=12.2 kbit/s. The actual transmit mode is this ceiling clamped to the peer's negotiated `mode-set`. Default `7` is the GSM-EFR-equivalent 12.2 kbit/s, the highest AMR-NB quality and the rate most enterprise PBXes and mobile networks default to. Out-of-range values clamp to `0..7`.
`AMRNB_OCTET_ALIGNED`	`true`	Offer octet-aligned AMR-NB framing (RFC 4867) in outbound SDP. When `false`, offers bandwidth-efficient framing. On answers, VoiceBlender always echoes the framing the peer negotiated.
`VSI_EVENT_BUFFER_SIZE`	`256`	Per-client buffer (in events) on the `/v1/vsi` WebSocket. When the client consumes events slower than they're produced, the buffer fills and new events are dropped (with a warn log on the leading edge of each drop burst and at every 10× threshold; the next delivered event also includes an `events_dropped` notification to the client). Clamped to `[16, 1_000_000]`. Tuning: larger values absorb longer back-pressure spikes at the cost of higher peak memory per client (roughly the average JSON event size × buffer size, e.g. ~1 KB × 256 ≈ 256 KB per connection at the default) and longer end-to-end latency for buffered events when the client recovers. Increase only if you observe drops on legitimate slow-consumer scenarios you can't fix at the client.
`MOQ_ENABLED`	`false`	Enable the experimental MoQ (Media over QUIC) inbound leg endpoint at `CONNECT /v1/legs/moq` over WebTransport/HTTP/3. PoC quality: tracks IETF draft-11 via `mengelbart/moqtransport`, single MoQ session per leg, Opus framed one frame per MoQ Object (LOC-style). When enabled, both `MOQ_TLS_CERT_FILE` and `MOQ_TLS_KEY_FILE` must be set.
`MOQ_LISTEN_ADDR`	`:8443`	UDP address for the HTTP/3 listener that backs the MoQ leg. Independent of `HTTP_ADDR` — TCP/`:8080` and UDP/`:8443` can run side-by-side.
`MOQ_TLS_CERT_FILE`	_(none)_	Path to the TLS certificate used by the HTTP/3 listener. Required when `MOQ_ENABLED=true`.
`MOQ_TLS_KEY_FILE`	_(none)_	Path to the TLS private key used by the HTTP/3 listener. Required when `MOQ_ENABLED=true`.
`MOQ_OPUS_BITRATE`	`24000`	Target bitrate (bps) for the Opus encoder feeding the MoQ leg's `mix` track. Must be in `6000..510000`.
`LIVEKIT_ENABLED`	`false`	Enable the `livekit_room` leg type at `POST /v1/legs` (`type=livekit_room`). Lets VoiceBlender join a LiveKit room as a participant and bridge audio between SIP and LiveKit. No LiveKit SDK is used — the signaling protocol is spoken directly via `github.com/livekit/protocol` protobufs over the existing pion stack.
`LIVEKIT_URL`	_(none)_	Default LiveKit server endpoint (`wss://...`). Required when `LIVEKIT_ENABLED=true` unless every request supplies `livekit.url`. Overridable per-request.
`LIVEKIT_OPUS_BITRATE`	`24000`	Target bitrate (bps) for the Opus encoder publishing audio into LiveKit. Must be in `6000..510000`. Overridable per-request via `livekit.opus_bitrate`.
`LIVEKIT_TOKEN_SIGNING_ENABLED`	`false`	Opt-in: when `true`, callers may omit `livekit.token` and instead pass `{room,identity,permissions}`; VoiceBlender mints the JWT itself. Security caveat: enabling this stores the LiveKit API secret (a high-privilege credential that can mint tokens for any room/identity on the LiveKit deployment) in VoiceBlender. Keep off in multi-tenant deployments.
`LIVEKIT_API_KEY`	_(none)_	LiveKit API key used to sign minted JWTs. Required only when `LIVEKIT_TOKEN_SIGNING_ENABLED=true`.
`LIVEKIT_API_SECRET`	_(none)_	LiveKit API secret used to sign minted JWTs. Required only when `LIVEKIT_TOKEN_SIGNING_ENABLED=true`. Treat as a high-value secret; redact in logs.
`LIVEKIT_DEFAULT_TOKEN_TTL`	`6h`	Default TTL applied to minted JWTs when the request omits `livekit.token_ttl`. Go duration string. LiveKit recommends ≤ 6 hours.

VoiceBlender configuration

Set these env vars before starting voiceblender:

Variable	Value
`SIP_TLS_PORT`	`5061`
`SIP_TLS_CERT`	path to `fullchain.pem` for your FQDN
`SIP_TLS_KEY`	path to `privkey.pem`
`SIP_DOMAIN`	the FQDN you registered with Meta (must match the cert SAN)

Make a test outbound call:

curl -X POST http://localhost:8080/v1/legs \
  -H 'Content-Type: application/json' \
  -d '{
    "type": "whatsapp",
    "to": "+447900000000",
    "from": "+441300000000",
    "auth": { "password": "<meta-issued-digest-password>" },
    "room_id": "wa-test"
  }'

The HTTP response returns immediately with the leg in ringing; subscribe to the webhook or /v1/vsi event stream to see leg.connected (or leg.disconnected with a reason if Meta rejects the INVITE).

API Overview

Full reference: API.md

Typical Workflow

1. Register a webhook        POST /v1/webhooks
2. Receive inbound call      --> webhook: leg.ringing {leg_id, from, to}
3. Answer                    POST /v1/legs/{id}/answer
4. Create a room             POST /v1/rooms
5. Add legs to room          POST /v1/rooms/{id}/legs
6. Attach AI agent           POST /v1/legs/{id}/agent
7. Start recording           POST /v1/legs/{id}/record
8. Hang up                   DELETE /v1/legs/{id}

Troubleshooting

403 SIP server X.X.X.X from INVITE does not match any SIP server configured for phone number ... — SIP_DOMAIN doesn't match what's registered with Meta. Set it to the FQDN, not the IP, and confirm via the GET /settings query above.
404 Not Found on outbound — usually means the recipient phone number isn't a valid WhatsApp user, or the destination URI is malformed. Confirm the digits in to are the actual user's E.164 number.
Call connects but Meta sends BYE after 20 s with Reason: ... not receiving any media for a long time — your audio path (RTP/UDP egress) is being dropped before reaching Meta. Check firewall rules for outbound UDP from the RTP_PORT_MIN–RTP_PORT_MAX range and that ICE-srflx candidates are correct.
DTLS handshake stalls — Meta's offer is setup:actpass + ice-lite, and they don't initiate DTLS. VoiceBlender forces setup:active automatically; if you see pcmedia: DTLS state state=connecting for >5 s, run with LOG_LEVEL=debug and inspect pion's DTLS scope for the actual error.
Set SIP_DEBUG=true to log the full RFC 3261 wire form of every SIP message, including the auth-bearing retry after the 401/407 challenge — that's the most useful diagnostic for any signalling-layer issue.

🇨🇳 中文文档镜像 AI 翻译 2026-05-27

英文原文章节由系统翻译为中文摘要，便于快速理解。完整原文见上方 "📑 README 深度解析"。

📌 简介

VoiceBlender 是一个基于 Go 语言开发的专业级语音服务，旨在实现 SIP 与 WebRTC 语音通话之间的无缝桥接。它支持多方音频混音（Multi-party audio mixing），并提供完善的 REST API 和实时 Webhooks 机制，能够帮助开发者构建复杂的实时语音交互应用。

⚡ 功能介绍

本项目具备强大的语音处理能力：支持 SIP 入站与出站通话，兼容 PCMU、PCMA、G.722 及 Opus 等多种编解码器，并��持 Digest Auth 与 RFC 4028 会话定时器。此外，它支持 SIP over TLS 以满足 WhatsApp 等平台的安全性要求，并具备 Early media 功能，允许在通话接通前播放自定义彩铃或 IVR 语音。通过灵活的 Inbound/Outbound 逻辑，可轻松实现与 Meta/WhatsApp 的集成。

📋 环境依赖

在进行集成测试时，系统需要部署两个 SIP 实例。项目核心依赖于高性能的 Go 语言库，包括用于处理 SIP 协议栈的 sipgo，以及业界领先的 WebRTC 实现库 pion/webrtc，确保了语音传输的高效与稳定。

🛠 安装步骤（Docker/pip/源码）

您可以通过 Go 编译环境直接构建并运行本项目。在项目根目录下执行 `go build -o voiceblender ./cmd/voiceblender` 生成二进制文件，随后通过 `./voiceblender` 命令启动服务。建议在生产环境中使用预编译的二进制文件以获得最佳性能。

🚀 使用教程

项目提供了丰富的示例代码以帮助快速上手。您可以参考 `examples/call_handler.py` 使用 Python 编写 Webhook 监听器来处理入站 SIP 通话及会议室逻辑；使用 `examples/webrtc-client/` 构建基于浏览器的 WebRTC 语音客户端，支持房间管理与 DTMF 按键；此外，还提供 `gen_test_wav.py` 用于生成测试用的 WAV 音频文件。

⚙️ 配置说明（含 MCP / env）

VoiceBlender 的所有配置均通过环境变量（Environment Variables）进行管理。启动前需根据需求设置 `INSTANCE_ID`、`HTTP_ADDR` 及 `SIP_BIND_IP` 等参数。若需启用 SIP TLS 功能，必须配置 `SIP_TLS_PORT`、`SIP_TLS_CERT`、`SIP_TLS_KEY` 以及与 Meta 注册信息一致的 `SIP_DOMAIN`。

🔌 API 说明

本项目提供完整的 REST API 接口用于控制通话生命周期、管理会议室及挂载 AI Agent。详细的接口定义、请求参数及响应格式请参阅项目中的 `API.md` 文档，以获取完整的 API Reference。

🔄 工作流/模块

典型的业务工作流如下：首先通过 `POST /v1/webhooks` 注册 Webhook 接收通知；当收到入站通话的 `leg.ringing` 事件后，通过 `POST /v1/legs/{id}/answer` 接听；随后创建 Room 并将通话 Leg 加入其中；最后通过 `POST /v1/legs/{id}/agent` 挂载 AI Agent 并根据需要开启录音功能。

❓ FAQ 摘要

针对常见问题，若遇到 SIP 403 错误，通常是因为 `SIP_DOMAIN` 与 Meta 注册的 FQDN 不匹配，请确保配置的是域名而非 IP；若在发起出站通话时收到 404 Not Found，请检查接收方号码的有效性及路由配置。通过 `GET /settings` 接口可以快速核对当前配置状态。

🎯 aiskill88 AI 点评 A 级 2026-05-26

aiskill88点评：底层能力扎实，将传统通信协议与现代AI语音流结合，是构建高性能语音Agent的理想基座。

📚 实用指南（长尾问题）

适合谁

构建多智能体协作系统的 Agent 开发者
做语音类 AI 产品的开发者

最佳实践

生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
容器内无法访问宿主机 localhost — 使用 host.docker.internal

部署方案

Docker：voiceblender 提供官方镜像，docker compose up 一键启动
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

⚡ 核心功能

可视化 Agent 工作流编排，无需编写复杂代码
支持多步骤自动化任务链，实现全流程无人值守
与外部 API、数据库和第三方服务无缝集成
内置错误处理与自动重试机制，保障稳定运行
提供可复用的自动化模板，快速在同类场景部署

👥 适合谁

构建多智能体协作系统的 Agent 开发者
做语音类 AI 产品的开发者

⭐ 最佳实践

生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

⚠️ 常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
容器内无法访问宿主机 localhost — 使用 host.docker.internal

👥 适合人群

自动化工程师和运维人员项目经理和业务分析师希望减少重复性工作的专业人士数字化转型团队

🎯 使用场景

自动化日常重复性工作，将精力集中于创造性任务
构建数据采集 → 处理 → 输出的完整自动化管线
实现跨平台、跨系统的数据流转和业务协同

⚖️ 优点与不足

✅ 优点

+MIT 协议，可免费商用
+大幅减少重复性人工操作
+可视化流程，清晰直观
+可扩展性强，支持复杂场景

⚠️ 不足

−初始配置和调试需投入一定时间
−强依赖外部服务的稳定性
−复杂场景需具备一定技术基础

⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台，本页面信息基于公开数据整理，不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后，再部署至生产环境，并做好必要的安全评估。

📄 License 说明

🔗 相关工具推荐

LangChain AI开发框架

Agent工作流

ai-agents-for-beginners Agent工作流

微软官方开源项目，提供12堂系统课程学习AI智能体框架。涵盖工作流设计、RAG检索增强、多智能体协作等核心技能。适合AI

📰 相关 AI 新闻

🍿 AI 圈相关吃瓜

AutoGPT 自主完成了任务：把我的文件夹全部重命名了

AI 圈观察

给 Agent 的目标是"提高效率"，三小时后它关掉了所有通知

AI 圈观察

Claude 回复了30页，我只问了"你好"

🗺️ 相关解决方案

ai-workflow-templates

🧩 你可能还需要

基于当前 Skill 的能力图谱，自动补全的工具组合

技能寻求者

MCP · Agent · 工作流

natively-cluely-ai-assistant — Claude Skill 中文使用文档

免费开源的AI面试助手，实时转录，隐蔽模式，局部RAG，BYOK。无订阅，防止数据泄露。

❓ 常见问题 FAQ

voiceblender 是什么工具？−

voiceblender 是一款Go开发的AI辅助工具。开源AI工作流：A programmable voice platform: SIP and WebRTC call control, multi-party mixing, 。⭐68 · Go 主要应用场景包括：AI语音助手集成、自动化电话��服系统、多方实时语音会议。

voiceblender 如何安装和开始使用？+

voiceblender 是否免费？许可证是什么？+

voiceblender 适合哪些用户使用？+

voiceblender 的社区活跃度和项目维护状况如何？+

什么是 Agent 工作流？和普通自动化有什么区别？+

导入工作流后，我需要修改哪些配置？+

工作流运行失败了，如何排查问题？+

💡 AI Skill Hub 点评

总体来看，VoiceBlender 语音控制平台是一款质量优秀的Agent工作流，在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态，建议收藏备用，结合自身场景选择合适时机引入使用。

⬇️ 获取与下载

⬇ 下载源码 ZIP

✅ MIT 协议 · 可免费商用 · 直接从 aiskill88 服务器下载，无需跳转 GitHub

📚 深入学习 VoiceBlender 语音控制平台

查看分步骤安装教程和完整使用指南，快速上手这款工具

⚙️ 安装教程 📚 使用教程

🌐 原始信息

原始名称	`voiceblender`
原始描述	开源AI工作流：A programmable voice platform: SIP and WebRTC call control, multi-party mixing, 。⭐68 · Go
Topics	`语音AIWebRTC实时通信`
GitHub	https://github.com/VoiceBlender/voiceblender
License	MIT
语言	Go

🔗 原始来源

🐙 GitHub 仓库 https://github.com/VoiceBlender/voiceblender 🌐 官方网站 https://voiceblender.org

收录时间：2026-05-26 · 更新时间：2026-05-30 · License：MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。

📺 订阅 AI Skill Hub Daily Telegram 频道

每天 8 条精选 AI Skill、MCP、Agent 与自动化工具推送

加入频道 →

VoiceBlender 语音控制平台

📚 深度解析

📋 工具概览

📖 中文文档

VoiceBlender

Features

Capabilities

Integration tests (requires two SIP instances)

Dependencies

Build and run

Quick Start

Examples

Configuration

VoiceBlender configuration

API Overview

Typical Workflow

Troubleshooting

⚡ 核心功能

👥 适合人群

🎯 使用场景

⚖️ 优点与不足

🔗 相关工具推荐

❓ 常见问题 FAQ

🤖 交给 Agent 安装 · VoiceBlender 语音控制平台