在PC上复刻小爱音箱功能
1.离线语音唤醒,用Picovoice实现
相关链接:
Picovoice github 主要用于查看python如何调用
python实现示例
import pvporcupine import pyaudio import struct def main(): # 初始化 Porcupine porcupine = pvporcupine.create( access_key="xxxxxxxx", # 替换为你的 Picovoice access key keyword_paths=["./hey-amy_en_windows_v3_0_0.ppn"] # 替换为你的关键词文件路径 ) # 初始化 PyAudio audio = pyaudio.PyAudio() # 获取默认的输入设备 stream = audio.open( rate=porcupine.sample_rate, channels=1, format=pyaudio.paInt16, input=True, frames_per_buffer=porcupine.frame_length ) print("Listening for wake word...") try: while True: # 从麦克风中读取音频数据 pcm = stream.read(porcupine.frame_length) pcm = struct.unpack_from("h" * porcupine.frame_length, pcm) # 处理音频数据,检测是否包含唤醒词 keyword_index = porcupine.process(pcm) if keyword_index >= 0: print("识别到了") except KeyboardInterrupt: print("Stopping...") finally: # 清理资源 stream.close() audio.terminate() porcupine.delete() if __name__ == "__main__": main()
2.语音识别(使用vosk实现)
实时识别麦克风传入的语音示例代码:
import os import pyaudio from vosk import Model, KaldiRecognizer # 加载模型路径 model_path = "./vosk-model-small-cn-0.22" # model_path = "./vosk-model-cn-0.22" if not os.path.exists(model_path): print("请下载并解压模型到项目目录") exit(1) # 初始化模型 model = Model(model_path) recognizer = KaldiRecognizer(model, 16000) # 初始化 PyAudio p = pyaudio.PyAudio() stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000) stream.start_stream() print("开始识别...") # 实时识别循环 while True: data = stream.read(4000) if recognizer.AcceptWaveform(data): print(recognizer.Result()) else: print(recognizer.PartialResult()) # 停止流 stream.stop_stream() stream.close() p.terminate()
3.自然语言处理(ollama在本地运行大模型)
4.语音合成(edge-tts语音合成)
列出可用voice命令:edge-tts --list
python实现edge-tts示例代码:
import edge_tts import asyncio from pydub import AudioSegment import simpleaudio as sa import os # zh-CN-XiaoxiaoNeural:中文(中国) - 女声 # zh-CN-XiaoyiNeural:中文(中国) - 女声 # zh-CN-liaoning-XiaobeiNeural:中文(辽宁) - 女声 # zh-HK-HiuGaaiNeural:中文(香港) - 女声 # zh-HK-HiuMaanNeural:中文(香港) - 女声 # zh-TW-HsiaoChenNeural:中文(台湾) - 女声 # zh-TW-HsiaoYuNeural:中文(台湾) - 女声 async def text_to_speech(text): communicate = edge_tts.Communicate(text, voice='zh-TW-HsiaoChenNeural') await communicate.save('output.mp3') # 转换为 WAV 格式 audio = AudioSegment.from_mp3('output.mp3') audio.export("output.wav", format="wav") # 播放 WAV 文件 wave_obj = sa.WaveObject.from_wave_file("output.wav") play_obj = wave_obj.play() # 等待音频播放完成 play_obj.wait_done() os.remove("output.mp3") os.remove("output.wav") text = "你好,欢迎使用文字转语音功能!" asyncio.run(text_to_speech(text))
5.知识库扩展(AnythingLLM给大模型投喂知识库)
AnythingLLM提供的接口文档(运行起来后可访问):http://localhost:3001/api/docs/
6.函数调用(查天气等)

浙公网安备 33010602011771号