在PC上复刻小爱音箱功能

1.离线语音唤醒,用Picovoice实现

相关链接:

Picovoice官网

Picovoice github 主要用于查看python如何调用

python实现示例

import pvporcupine
import pyaudio
import struct

def main():
    # 初始化 Porcupine
    porcupine = pvporcupine.create(
        access_key="xxxxxxxx",  # 替换为你的 Picovoice access key
        keyword_paths=["./hey-amy_en_windows_v3_0_0.ppn"]  # 替换为你的关键词文件路径
    )

    # 初始化 PyAudio
    audio = pyaudio.PyAudio()

    # 获取默认的输入设备
    stream = audio.open(
        rate=porcupine.sample_rate,
        channels=1,
        format=pyaudio.paInt16,
        input=True,
        frames_per_buffer=porcupine.frame_length
    )

    print("Listening for wake word...")

    try:
        while True:
            # 从麦克风中读取音频数据
            pcm = stream.read(porcupine.frame_length)
            pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)

            # 处理音频数据,检测是否包含唤醒词
            keyword_index = porcupine.process(pcm)

            if keyword_index >= 0:
                print("识别到了")
    except KeyboardInterrupt:
        print("Stopping...")
    finally:
        # 清理资源
        stream.close()
        audio.terminate()
        porcupine.delete()

if __name__ == "__main__":
    main()

 

2.语音识别(使用vosk实现)

vosk官网

实时识别麦克风传入的语音示例代码:

import os
import pyaudio
from vosk import Model, KaldiRecognizer

# 加载模型路径
model_path = "./vosk-model-small-cn-0.22"
# model_path = "./vosk-model-cn-0.22"
if not os.path.exists(model_path):
    print("请下载并解压模型到项目目录")
    exit(1)

# 初始化模型
model = Model(model_path)
recognizer = KaldiRecognizer(model, 16000)

# 初始化 PyAudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

print("开始识别...")

# 实时识别循环
while True:
    data = stream.read(4000)
    if recognizer.AcceptWaveform(data):
        print(recognizer.Result())
    else:
        print(recognizer.PartialResult())

# 停止流
stream.stop_stream()
stream.close()
p.terminate()

 

3.自然语言处理(ollama在本地运行大模型)

ollama官网

ollama github

4.语音合成(edge-tts语音合成)

vc++14下载地址

ffmpeg下载地址

列出可用voice命令:edge-tts --list

python实现edge-tts示例代码:

import edge_tts
import asyncio
from pydub import AudioSegment
import simpleaudio as sa
import os

# zh-CN-XiaoxiaoNeural:中文(中国) - 女声
# zh-CN-XiaoyiNeural:中文(中国) - 女声
# zh-CN-liaoning-XiaobeiNeural:中文(辽宁) - 女声
# zh-HK-HiuGaaiNeural:中文(香港) - 女声
# zh-HK-HiuMaanNeural:中文(香港) - 女声
# zh-TW-HsiaoChenNeural:中文(台湾) - 女声
# zh-TW-HsiaoYuNeural:中文(台湾) - 女声
async def text_to_speech(text):
    communicate = edge_tts.Communicate(text, voice='zh-TW-HsiaoChenNeural')
    await communicate.save('output.mp3')

    # 转换为 WAV 格式
    audio = AudioSegment.from_mp3('output.mp3')
    audio.export("output.wav", format="wav")

    # 播放 WAV 文件
    wave_obj = sa.WaveObject.from_wave_file("output.wav")
    play_obj = wave_obj.play()

    # 等待音频播放完成
    play_obj.wait_done()

    os.remove("output.mp3")
    os.remove("output.wav")

text = "你好,欢迎使用文字转语音功能!"
asyncio.run(text_to_speech(text))

 

5.知识库扩展(AnythingLLM给大模型投喂知识库)

AnythingLLM官网

AnythingLLM提供的接口文档(运行起来后可访问):http://localhost:3001/api/docs/

6.函数调用(查天气等)

posted @ 2024-09-23 23:38  右仆射卧龙  阅读(520)  评论(0)    收藏  举报