前端 + AI 进阶 Day 11: 语音输入/输出

前端 + AI 进阶学习路线｜Week 9-10：对话式界面设计

Day 11：语音输入/输出

学习时间：2026年1月4日（星期日）
关键词：语音输入、语音输出、Web Speech API、SpeechRecognition、SpeechSynthesis、无障碍交互

📁 项目文件结构

day11-voice-chat/
├── src/
│   ├── components/
│   │   ├── VoiceControlBar.jsx      # 语音控制栏（麦克风按钮 + 状态）
│   │   ├── ChatWindow.jsx           # 聊天窗口（复用 Day 10 富消息）
│   │   └── SessionSidebar.jsx       # 复用 Day 9 会话管理
│   ├── hooks/
│   │   ├── useSessionManager.js     # 复用 Day 9
│   │   └── useVoiceRecognition.js   # 语音识别 Hook
│   ├── lib/
│   │   └── aiClient.js              # 复用 Day 4
│   └── App.jsx                      # 主应用集成
└── public/

✅ 本日核心：实现“说话提问 + 听 AI 回答”的全语音交互体验

🎯 今日学习目标

使用 Web Speech API 实现 语音输入（说话转文字）
使用 SpeechSynthesis API 实现 语音输出（AI 回答转语音）
构建 语音控制栏（开始/停止识别、状态提示）
与多会话 + 富交互消息系统无缝集成

💡 为什么需要语音交互？

语音是更自然的交互方式，尤其适用于：

无障碍场景：视障用户、行动不便者
多任务场景：开车、做饭时 hands-free 操作
效率提升：说话比打字更快（平均 150 字/分钟 vs 40 字/分钟）

✅ 语音 + 文字双模态，让 AI 应用更包容、更高效

📚 核心 Web API

功能	API	浏览器支持
语音识别	`SpeechRecognition` (Web Speech API)	Chrome, Edge, Safari (部分)
语音合成	`SpeechSynthesis` (Web Speech API)	全主流浏览器
权限	需 HTTPS 或 localhost	开发环境安全

⚠️ 注意：

语音识别需用户主动触发（不能自动开始）

首次使用需用户授权麦克风权限

🔧 动手实践：构建全语音 AI 聊天

步骤 1：创建项目并复用前期组件

npx create-react-app day11-voice-chat
cd day11-voice-chat
# 复制 Day 9-10 的 hooks/ 和 components/
# 安装依赖（同 Day 9）
npm install react-virtual react-markdown remark-gfm rehype-highlight rehype-katex katex @microsoft/fetch-event-source

步骤 2：创建语音识别 Hook

// src/hooks/useVoiceRecognition.js
import { useState, useEffect, useRef } from 'react';

// 检查浏览器是否支持
const isSpeechRecognitionSupported = () => {
  return 'webkitSpeechRecognition' in window || 'SpeechRecognition' in window;
};

export const useVoiceRecognition = ({ onResult, onError }) => {
  const [isListening, setIsListening] = useState(false);
  const [isSupported, setIsSupported] = useState(false);
  const recognitionRef = useRef(null);

  useEffect(() => {
    setIsSupported(isSpeechRecognitionSupported());
    
    if (!isSpeechRecognitionSupported()) return;

    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    const recognition = new SpeechRecognition();
    
    recognition.continuous = false; // 单次识别（说完自动停止）
    recognition.interimResults = true; // 实时返回中间结果
    recognition.lang = 'zh-CN'; // 默认中文，可动态切换

    recognition.onstart = () => {
      setIsListening(true);
    };

    recognition.onend = () => {
      setIsListening(false);
    };

    recognition.onerror = (event) => {
      console.error('Speech recognition error', event.error);
      setIsListening(false);
      onError?.(event.error);
    };

    recognition.onresult = (event) => {
      let finalTranscript = '';
      let interimTranscript = '';

      for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript;
        if (event.results[i].isFinal) {
          finalTranscript += transcript;
        } else {
          interimTranscript += transcript;
        }
      }

      // 优先返回最终结果
      if (finalTranscript) {
        onResult?.(finalTranscript.trim(), true); // isFinal = true
        recognition.stop(); // 自动停止
      } else if (interimTranscript) {
        onResult?.(interimTranscript.trim(), false); // isFinal = false
      }
    };

    recognitionRef.current = recognition;

    return () => {
      if (recognitionRef.current) {
        recognitionRef.current.stop();
      }
    };
  }, []);

  const startListening = () => {
    if (!isSupported) {
      alert('当前浏览器不支持语音识别，请使用 Chrome 或 Edge');
      return;
    }
    try {
      recognitionRef.current.start();
    } catch (e) {
      console.error('Failed to start recognition', e);
      onError?.(e.message);
    }
  };

  const stopListening = () => {
    if (recognitionRef.current) {
      recognitionRef.current.stop();
    }
  };

  return {
    isSupported,
    isListening,
    startListening,
    stopListening,
  };
};

步骤 3：创建语音控制栏组件

// src/components/VoiceControlBar.jsx
import { useState } from 'react';

const VoiceControlBar = ({ 
  onVoiceInput, 
  onToggleAutoSpeak,
  isAutoSpeakEnabled 
}) => {
  const [interimText, setInterimText] = useState('');
  const [isListening, setIsListening] = useState(false);
  const { startListening, stopListening, isSupported } = useVoiceRecognition({
    onResult: (text, isFinal) => {
      if (isFinal) {
        onVoiceInput?.(text);
        setInterimText('');
        setIsListening(false);
      } else {
        setInterimText(text);
      }
    },
    onError: (error) => {
      console.error('语音识别错误:', error);
      setIsListening(false);
      setInterimText('');
    }
  });

  const toggleListening = () => {
    if (isListening) {
      stopListening();
    } else {
      setInterimText('');
      startListening();
      setIsListening(true);
    }
  };

  const speakText = (text) => {
    if (!window.speechSynthesis) return;
    
    // 取消当前语音
    window.speechSynthesis.cancel();
    
    const utterance = new SpeechSynthesisUtterance(text);
    utterance.lang = 'zh-CN';
    utterance.rate = 1.0; // 语速
    utterance.pitch = 1.0; // 音调
    
    window.speechSynthesis.speak(utterance);
  };

  // 注入全局 speak 函数（供其他组件调用）
  useEffect(() => {
    window.speakAIResponse = speakText;
  }, []);

  return (
    <div style={{ 
      padding: '12px 16px', 
      borderTop: '1px solid #e8e8e8',
      backgroundColor: '#fff',
      display: 'flex',
      alignItems: 'center',
      gap: '12px'
    }}>
      {/* 麦克风按钮 */}
      <button
        onClick={toggleListening}
        disabled={!isSupported}
        title={!isSupported ? '浏览器不支持' : isListening ? '点击停止' : '点击说话'}
        style={{
          width: '40px',
          height: '40px',
          borderRadius: '50%',
          border: 'none',
          backgroundColor: isListening ? '#ff4d4f' : '#f0f0f0',
          display: 'flex',
          justifyContent: 'center',
          alignItems: 'center',
          cursor: isSupported ? 'pointer' : 'not-allowed',
          color: isListening ? 'white' : '#333',
          fontSize: '18px',
        }}
      >
        {isListening ? '⏹' : '🎤'}
      </button>

      {/* 实时识别文本 */}
      {interimText && (
        <div style={{ 
          color: '#888', 
          fontSize: '14px',
          fontStyle: 'italic',
          flex: 1,
          minWidth: 0,
          whiteSpace: 'nowrap',
          overflow: 'hidden',
          textOverflow: 'ellipsis'
        }}>
          {interimText}
        </div>
      )}

      {/* 自动朗读开关 */}
      <label style={{ display: 'flex', alignItems: 'center', gap: '6px', cursor: 'pointer' }}>
        <input
          type="checkbox"
          checked={isAutoSpeakEnabled}
          onChange={(e) => onToggleAutoSpeak(e.target.checked)}
          style={{ cursor: 'pointer' }}
        />
        <span style={{ fontSize: '14px', color: '#666' }}>自动朗读 AI 回答</span>
      </label>
    </div>
  );
};

export default VoiceControlBar;

步骤 4：在聊天窗口中集成语音

// src/components/ChatWindow.jsx（关键更新）
import { useState, useEffect, useRef } from 'react';
import VirtualChatList from './VirtualChatList';
import MessageRenderer from './MessageRenderer';
import VoiceControlBar from './VoiceControlBar';
import { streamAIResponse } from '../lib/aiClient';

const ChatWindow = ({ session, onUpdateMessages }) => {
  const [inputValue, setInputValue] = useState('');
  const [streamingMessage, setStreamingMessage] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [isAutoSpeakEnabled, setIsAutoSpeakEnabled] = useState(false);
  const chatContainerRef = useRef(null);

  // 处理 AI 完整响应（用于语音）
  const handleAIResponseComplete = (fullResponse) => {
    // ...（消息处理逻辑，同 Day 10）

    // 自动朗读
    if (isAutoSpeakEnabled && window.speakAIResponse) {
      // 延迟 300ms 等待界面渲染
      setTimeout(() => {
        window.speakAIResponse(fullResponse);
      }, 300);
    }
  };

  // 语音输入处理
  const handleVoiceInput = (text) => {
    if (!text.trim()) return;
    setInputValue(text);
    // 自动发送（可选）
    setTimeout(() => {
      handleSend(text);
    }, 100);
  };

  // 发送消息（支持文本或语音）
  const handleSend = async (text = inputValue) => {
    if (!text.trim()) return;
    
    // 添加用户消息
    const newUserMsg = { id: Date.now(), role: 'user', type: 'text', content: text };
    const updatedMessages = [...session.messages, newUserMsg];
    onUpdateMessages(updatedMessages);
    setInputValue('');
    
    // 启动 AI 响应（略，复用 Day 4 逻辑）
    // 在 onComplete 中调用 handleAIResponseComplete
  };

  return (
    <div style={{ display: 'flex', flexDirection: 'column', height: '100%' }}>
      <div
        ref={chatContainerRef}
        style={{
          flex: 1,
          overflow: 'auto',
          padding: '16px',
        }}
      >
        <VirtualChatList 
          messages={renderMessages} 
          onAction={handleAction}
          isStreaming={isStreaming}
        />
      </div>
      
      {/* 输入区域（略） */}
      
      {/* 语音控制栏 */}
      <VoiceControlBar
        onVoiceInput={handleVoiceInput}
        onToggleAutoSpeak={setIsAutoSpeakEnabled}
        isAutoSpeakEnabled={isAutoSpeakEnabled}
      />
    </div>
  );
};

步骤 5：在 App.jsx 中集成

// src/App.jsx（基本结构同 Day 10，仅需确保 ChatWindow 被使用）
import 'katex/dist/katex.min.css';
import { useSessionManager } from './hooks/useSessionManager';
import SessionSidebar from './components/SessionSidebar';
import ChatWindow from './components/ChatWindow';

function App() {
  const {
    sessions,
    currentSession,
    createSession,
    switchSession,
    updateSessionMessages,
    updateSessionTitle,
    deleteSession,
  } = useSessionManager();

  if (!currentSession) return <div style={{ padding: '20px' }}>加载中...</div>;

  return (
    <div style={{ display: 'flex', height: '100vh', fontFamily: 'Inter, -apple-system, sans-serif' }}>
      <SessionSidebar {...{ sessions, currentSessionId: currentSession.id, createSession, switchSession, updateSessionTitle, deleteSession }} />
      <div style={{ flex: 1, display: 'flex', flexDirection: 'column' }}>
        <div style={{ padding: '16px 24px', borderBottom: '1px solid #e8e8e8', fontSize: '16px', fontWeight: '600' }}>
          {currentSession.title}
        </div>
        <ChatWindow 
          session={currentSession} 
          onUpdateMessages={updateSessionMessages} 
        />
      </div>
    </div>
  );
}

export default App;

✅ 效果验证

✅ 点击麦克风图标 → 弹出麦克风权限 → 开始说话
✅ 说话时显示实时识别文本（灰色斜体）
✅ 说完自动停止，文本填入输入框并发送
✅ 开启“自动朗读” → AI 回答后自动语音播报
✅ 再次点击麦克风 → 停止识别
✅ 在 Chrome/Edge 中完美运行（Safari 仅支持语音合成）

🤔 思考与延伸

多语言支持：如何动态切换识别/合成语言？
→ 在 useVoiceRecognition 中添加 language 参数
离线语音：如何支持无网络时的语音？
→ 浏览器语音 API 本身支持离线（但效果因系统而异）
性能优化：如何避免频繁语音打断？
→ 添加防抖（如 1 秒内不重复朗读）

💡 无障碍增强：此功能天然提升 WCAG 合规性，适合政府/企业级应用

📅 明日预告

Day 12：可视化工作流编辑器

使用 React Flow 构建拖拽式工作流画布
节点连线 + 配置面板
启动智能工作流前端开发

✍️ 小结

今天，我们让 AI 聊天从“键盘鼠标”走向“自然语言”！通过语音输入与输出，用户可完全 hands-free 与 AI 交互，大幅提升效率与包容性。语音是 AI 原生应用的最后一块拼图。

💬 实践提示：语音识别在安静环境下效果最佳；嘈杂环境建议使用蓝牙耳机麦克风。欢迎分享你的语音交互体验！

posted @ 2025-12-30 15:50 XiaoZhengTou 阅读(18) 评论(0) 收藏举报

刷新页面返回顶部

前端+AI的结合

前端 + AI 进阶 Day 11: 语音输入/输出

前端 + AI 进阶学习路线｜Week 9-10：对话式界面设计

Day 11：语音输入/输出

📁 项目文件结构

🎯 今日学习目标

💡 为什么需要语音交互？

📚 核心 Web API

🔧 动手实践：构建全语音 AI 聊天

步骤 1：创建项目并复用前期组件

步骤 2：创建语音识别 Hook

步骤 3：创建语音控制栏组件

步骤 4：在聊天窗口中集成语音

步骤 5：在 App.jsx 中集成

✅ 效果验证

🤔 思考与延伸

📅 明日预告

✍️ 小结

公告