离线语音识别 silero_vad.onnx
需要下载的文件
- silero_vad.onnx - 语音活动检测模型
- sherpa-onnx-sense-voice - 语音识别模型
下载步骤
方法1:手动下载(推荐)
打开浏览器或使用 wget 命令:
# 进入你的工作目录
cd ~/TMSpeech/external_recognizer
# 1. 下载silero_vad.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
# 2. 下载SenseVoice模型
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
# 3. 解压模型文件
tar -xjf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
# 4. 检查文件是否存在
ls -la silero_vad.onnx
ls -la sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09/
方法2:使用下载脚本
创建一个下载脚本 download_models.sh:
#!/bin/bash
# download_models.sh
echo "下载语音识别模型文件..."
cd ~/TMSpeech/external_recognizer
# 检查并下载silero_vad.onnx
if [ ! -f "silero_vad.onnx" ]; then
echo "下载 silero_vad.onnx..."
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
else
echo "silero_vad.onnx 已存在"
fi
# 检查并下载SenseVoice模型
if [ ! -d "sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09" ]; then
if [ ! -f "sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2" ]; then
echo "下载 SenseVoice 模型..."
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
fi
echo "解压模型文件..."
tar -xjf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
# 可选:删除压缩包以节省空间
# rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
else
echo "SenseVoice 模型目录已存在"
fi
# 检查文件
echo ""
echo "检查文件:"
ls -la silero_vad.onnx
echo ""
ls -la sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09/
echo ""
echo "下载完成!"
运行下载脚本:
chmod +x download_models.sh
./download_models.sh
方法3:如果下载缓慢,使用备用方法
如果下载速度太慢,可以尝试:
# 使用代理(如果有的话)
export http_proxy=http://你的代理地址:端口
export https_proxy=http://你的代理地址:端口
# 或者使用curl
curl -L -o silero_vad.onnx https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
curl -L -o sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2 https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
# 或者分段下载
aria2c -x16 -s16 https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2
下载完成后运行程序
# 检查文件结构
cd ~/TMSpeech/external_recognizer
ls -la
# 应该看到类似这样的文件:
# silero_vad.onnx
# sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09/
# ├── tokens.txt
# ├── model.int8.onnx
# └── ...
# 运行程序,指定模型路径(如果不在默认位置)
python simulate-streaming-sense-voice.py \
--silero-vad-model ./silero_vad.onnx \
--tokens ./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09/tokens.txt \
--sense-voice ./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09/model.int8.onnx
快速测试程序
在下载模型的同时,可以先测试录音功能。创建一个简单的测试程序:
#!/usr/bin/env python3
import sys
import os
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from common_audio_utils import get_audio_devices
import pyaudio
import queue
import threading
import time
def test_microphone():
"""测试麦克风是否能正常工作"""
print("=== 麦克风测试 ===")
p = pyaudio.PyAudio()
try:
# 找到默认输入设备
try:
default_input = p.get_default_input_device_info()
print(f"默认输入设备: [{default_input['index']}] {default_input['name']}")
# 测试打开麦克风
stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
input_device_index=default_input['index'],
frames_per_buffer=1600 # 100ms
)
print("正在监听麦克风... 请说话(按Ctrl+C停止)")
# 监听5秒
start_time = time.time()
max_amplitude = 0
while time.time() - start_time < 5.0:
try:
# 读取音频数据
data = stream.read(160, exception_on_overflow=False)
samples = list(int.from_bytes(data[i:i+2], 'little', signed=True)
for i in range(0, len(data), 2))
# 计算当前块的振幅
current_max = max(abs(s) for s in samples)
if current_max > max_amplitude:
max_amplitude = current_max
# 显示音量条
bar_length = 30
volume = min(1.0, current_max / 32768.0)
filled_length = int(bar_length * volume)
bar = '█' * filled_length + '░' * (bar_length - filled_length)
print(f"音量: |{bar}| {volume*100:.1f}%", end='\r')
except Exception as e:
print(f"读取音频时出错: {e}")
break
print(f"\n\n测试完成!最大振幅: {max_amplitude}")
if max_amplitude > 1000:
print("✓ 麦克风工作正常!")
elif max_amplitude > 100:
print("⚠ 麦克风声音较小,请靠近麦克风说话")
else:
print("✗ 麦克风可能没有检测到声音,请检查设备")
stream.stop_stream()
stream.close()
except Exception as e:
print(f"无法访问默认输入设备: {e}")
# 列出所有设备
devices = get_audio_devices(p)
print("\n所有可用设备:")
for idx, device in devices:
if device['maxInputChannels'] > 0:
print(f" [{idx}] {device['name']} (输入通道: {device['maxInputChannels']})")
finally:
p.terminate()
if __name__ == "__main__":
try:
test_microphone()
except KeyboardInterrupt:
print("\n\n测试被用户中断")
except Exception as e:
print(f"\n发生错误: {e}")
import traceback
traceback.print_exc()
运行这个测试程序,确保麦克风正常工作:
python test_microphone.py
下一步
- 先下载模型文件
- 测试麦克风是否工作正常
- 运行主程序进行语音识别
请告诉我下载是否成功,或者是否有任何其他问题!

浙公网安备 33010602011771号