AI Skill Hub 强烈推荐:FunASR — AI 语音识别工具中文文档 是一款优质的AI工具。在 GitHub 上收获超过 16.2k 颗 Star,AI 综合评分 9.4 分,在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案,这是一个值得深入了解的选择。
FunASR — AI 语音识别工具中文文档 是一款基于 Python 开发的开源工具,专注于 audio-visual-speech-recognition、conformer、dfsmn 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
FunASR — AI 语音识别工具中文文档 是一款基于 Python 开发的开源工具,专注于 audio-visual-speech-recognition、conformer、dfsmn 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install funasr
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install funasr
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/modelscope/FunASR
cd FunASR
pip install -e .
# 验证安装
python -c "import funasr; print('安装成功')"
# 命令行使用
funasr --help
# 基本用法
funasr input_file -o output_file
# Python 代码中调用
import funasr
# 示例
result = funasr.process("input")
print(result)
# funasr 配置文件示例(config.yml) app: name: "funasr" debug: false log_level: "INFO" # 运行时指定配置文件 funasr --config config.yml # 或通过环境变量配置 export FUNASR_API_KEY="your-key" export FUNASR_OUTPUT_DIR="./output"
[//]: # '<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>'
(简体中文|English)
[//]: # "# FunASR: A Fundamental End-to-End Speech Recognition Toolkit"
<p align="center"> <a href="https://trendshift.io/repositories/3839" target="_blank"><img src="https://trendshift.io/api/badge/repositories/3839" alt="modelscope%2FFunASR | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>
<strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun!
Highlights | News | Installation | Quick Start | Tutorial | Runtime | Model Zoo | Contact
<a name="highlights"></a>
<a name="whats-new"></a>
vad_model + spk_model + punc_model to get per-sentence speaker labels. See Fun-ASR-Nano demo, SenseVoice demo.<details><summary>Full Changelog</summary> - 2024/07/01: Offline File Transcription Service GPU 1.1 released, optimize BladeDISC model compatibility issues; ref to (docs) - 2024/06/27: Offline File Transcription Service GPU 1.0 released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU); ref to (docs) - 2024/05/15:emotion recognition models are new supported. emotion2vec+large,emotion2vec+base,emotion2vec+seed. currently supports the following categories: 0: angry 1: happy 2: neutral 3: sad 4: unknown. - 2024/05/15: Offline File Transcription Service 4.5, Offline File Transcription Service of English 1.6, Real-time Transcription Service 1.10 released, adapting to FunASR 1.0 model structure;(docs) - 2024/03/05:Added the Qwen-Audio and Qwen-Audio-Chat large-scale audio-text multimodal models, which have topped multiple audio domain leaderboards. These models support speech dialogue, usage. - 2024/03/05:Added support for the Whisper-large-v3 model, a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. It can be downloaded from themodelscope, and openai. - 2024/03/05: Offline File Transcription Service 4.4, Offline File Transcription Service of English 1.5,Real-time Transcription Service 1.9 released,docker image supports ARM64 platform, update modelscope;(docs) - 2024/01/30:funasr-1.0 has been released (docs) - 2024/01/30:emotion recognition models are new supported. model link, modified from repo. - 2024/01/25: Offline File Transcription Service 4.2, Offline File Transcription Service of English 1.3 released,optimized the VAD (Voice Activity Detection) data processing method, significantly reducing peak memory usage, memory leak optimization; Real-time Transcription Service 1.7 released,optimizatized the client-side;(docs) - 2024/01/09: The Funasr SDK for Windows version 2.0 has been released, featuring support for The offline file transcription service (CPU) of Mandarin 4.1, The offline file transcription service (CPU) of English 1.2, The real-time transcription service (CPU) of Mandarin 1.6. For more details, please refer to the official documentation or release notes(FunASR-Runtime-Windows) - 2024/01/03: File Transcription Service 4.0 released, Added support for 8k models, optimized timestamp mismatch issues and added sentence-level timestamps, improved the effectiveness of English word FST hotwords, supported automated configuration of thread parameters, and fixed known crash issues as well as memory leak problems, refer to (docs). - 2024/01/03: Real-time Transcription Service 1.6 released,The 2pass-offline mode supports Ngram language model decoding and WFST hotwords, while also addressing known crash issues and memory leak problems, (docs) - 2024/01/03: Fixed known crash issues as well as memory leak problems, (docs). - 2023/12/04: The Funasr SDK for Windows version 1.0 has been released, featuring support for The offline file transcription service (CPU) of Mandarin, The offline file transcription service (CPU) of English, The real-time transcription service (CPU) of Mandarin. For more details, please refer to the official documentation or release notes(FunASR-Runtime-Windows) - 2023/11/08: The offline file transcription service 3.0 (CPU) of Mandarin has been released, adding punctuation large model, Ngram language model, and wfst hot words. For detailed information, please refer to docs. - 2023/10/17: The offline file transcription service (CPU) of English has been released. For more details, please refer to (docs). - 2023/10/13: SlideSpeech: A large scale multi-modal audio-visual corpus with a significant amount of real-time synchronized slides. - 2023/10/10: The ASR-SpeakersDiarization combined pipeline Paraformer-VAD-SPK is now released. Experience the model to get recognition results with speaker information. - 2023/10/07: FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec. - 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to (docs). - 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to (docs). - 2023/07/17: BAT is released, which is a low-latency and low-memory-consumption RNN-T model. For more details, please refer to (BAT). - 2023/06/26: ASRU2023 Multi-Channel Multi-Party Meeting Transcription Challenge 2.0 completed the competition and announced the results. For more details, please refer to (M2MeT2.0).
</details>
<a name="Installation"></a>
python>=3.8
torch>=1.13
torchaudio
pip3 install -U funasr
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./
pip3 install -U modelscope huggingface_hub
from pathlib import Path from runtime.python.onnxruntime.funasr_onnx.paraformer_bin import Paraformer
home_dir = Path.home()
model_dir = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" model = Paraformer(model_dir, batch_size=1, quantize=True)
wav_path = [f"{home_dir}/.cache/modelscope/hub/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav"]
result = model(wav_path) print(result) ```
More examples ref to demo
FunASR supports deploying pre-trained or further fine-tuned models for service. Currently, it supports the following types of service deployment:
For more detailed information, please refer to the service deployment documentation.
<a name="contact"></a>
Below is a quick start tutorial. Test audio files (Mandarin, English).
funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav
Notes: Support recognition of single audio file, as well as file list in Kaldi-style wav.scp format: wav_id wav_pat
funasr-export ++model=paraformer ++quantize=false ++device=cpu
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
总体来看,FunASR — AI 语音识别工具中文文档 是一款质量优秀的AI工具,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | FunASR |
| 原始描述 | A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. |
| Topics | audio-visual-speech-recognitionconformerdfsmnparaformerpretrained-modelpunctuationstt |
| GitHub | https://github.com/modelscope/FunASR |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-22 · 更新时间:2026-05-22 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。