1. 常见的搜索算法

In speech recognition, several search algorithms are commonly used to efficiently match spoken input to a large vocabulary or set of predefined patterns. These algorithms are designed to handle the complexities and variability of human speech. The most commonly used search algorithms in speech recognition include:在语音识别中,通常使用多种搜索算法来有效地将语音输入与大量词汇或一组预定义模式进行匹配。这些算法旨在处理人类语音的复杂性和可变性。语音识别中最常用的搜索算法包括:

  1. Dynamic Time Warping (DTW):动态时间扭曲 (DTW):

    • Description: DTW is an algorithm used to measure the similarity between two temporal sequences that may vary in speed. It aligns sequences of feature vectors extracted from speech.描述:DTW 是一种用于测量速度可能不同的两个时间序列之间的相似性的算法。它对齐从语音中提取的特征向量序列。
    • Use Case: It is often used in template-based speech recognition systems and in isolated word recognition tasks.用例:它通常用于基于模板的语音识别系统和孤立的单词识别任务。
  2. Viterbi Algorithm:维特比算法:

    • Description: The Viterbi algorithm is a dynamic programming algorithm used to find the most likely sequence of hidden states (e.g., phonemes) that result in a sequence of observed events (e.g., acoustic features).描述:维特比算法是一种动态编程算法,用于查找最可能的隐藏状态序列(例如音素),从而产生一系列观察到的事件(例如声学特征)。
    • Use Case: It is widely used in Hidden Markov Model (HMM)-based speech recognition systems to decode the optimal path through the HMM states.使用案例:它广泛用于基于隐马尔可夫模型 (HMM) 的语音识别系统,以解码通过 HMM 状态的最佳路径。
  3. A Search Algorithm:搜索算法

    • Description: A* search is a heuristic-based search algorithm that finds the shortest path to a goal node. It uses an evaluation function to rank nodes during the search process.描述:A* 搜索是一种基于启发式的搜索算法,用于查找到目标节点的最短路径。它使用评估函数在搜索过程中对节点进行排名。
    • Use Case: A* search is used in large vocabulary continuous speech recognition (LVCSR) systems to efficiently search through the hypothesis space.使用案例:A* 搜索用于大词汇量连续语音识别 (LVCSR) 系统,以有效搜索假设空间。
  4. Beam Search:光束搜索:

    • Description: Beam search is an optimization of the breadth-first search algorithm. It explores a graph by expanding the most promising nodes, using a beam width to limit the number of nodes kept at each level.描述:集束搜索是广度优先搜索算法的优化。它通过扩展最有希望的节点来探索图,使用波束宽度来限制每个级别保留的节点数量。
    • Use Case: Beam search is commonly used in both HMM and neural network-based speech recognition systems for decoding purposes. It balances search accuracy and computational efficiency.使用案例:波束搜索通常用于 HMM 和基于神经网络的语音识别系统中以进行解码。它平衡了搜索精度和计算效率。
  5. Token Passing Algorithm:令牌传递算法:

    • Description: This algorithm involves passing tokens through a network of states (e.g., phonemes, words) while keeping track of the best path and scores associated with each token.描述:该算法涉及通过状态网络(例如音素、单词)传递令牌,同时跟踪与每个令牌相关的最佳路径和分数。
    • Use Case: It is used in HMM-based speech recognition to implement efficient decoding.用例:用于基于HMM的语音识别,实现高效解码。
  6. Depth-First Search (DFS) and Breadth-First Search (BFS):深度优先搜索(DFS)和广度优先搜索(BFS):

    • Description: DFS explores paths in the graph to their maximum depth before backtracking, while BFS explores all nodes at the present depth level before moving on to nodes at the next depth level.描述:DFS 在回溯之前探索图中的路径至其最大深度,而 BFS 在移动到下一个深度级别的节点之前探索当前深度级别的所有节点。
    • Use Case: These basic search algorithms can be used in smaller or simpler speech recognition tasks.使用案例:这些基本搜索算法可用于更小或更简单的语音识别任务。
  7. Prefix Search Algorithm:前缀搜索算法:

    • Description: Prefix search algorithm is used to handle prefix matching efficiently, often in combination with tries or finite state transducers (FSTs).描述:前缀搜索算法用于有效地处理前缀匹配,通常与尝试或有限状态转换器 (FST) 结合使用。
    • Use Case: It is used in real-time and interactive speech recognition applications for prefix-based recognition and auto-completion.使用案例:它用于实时和交互式语音识别应用程序,用于基于前缀的识别和自动完成。

These algorithms are often used in combination with models like HMMs, neural networks, and language models to build robust speech recognition systems capable of handling diverse and noisy speech inputs.这些算法通常与 HMM、神经网络和语言模型等模型结合使用,构建强大的语音识别系统,能够处理多样化和嘈杂的语音输入。

2. 束搜索算法

Beam Search is a heuristic search algorithm that is widely used in speech recognition and other areas such as machine translation and natural language processing. The principle of Beam Search involves exploring a search space by expanding the most promising nodes (or hypotheses) at each level, while limiting the number of nodes that are kept at each level to a fixed number, known as the beam width. This approach strikes a balance between breadth-first search (which explores all possible nodes) and depth-first search (which explores one path fully before backtracking).Beam Search 是一种启发式搜索算法,广泛应用于语音识别以及机器翻译和自然语言处理等其他领域。波束搜索的原理涉及通过扩展每个级别最有希望的节点(或假设)来探索搜索空间,同时将每个级别保留的节点数量限制为固定数量(称为波束宽度)。这种方法在广度优先搜索(探索所有可能的节点)和深度优先搜索(在回溯之前充分探索一条路径)之间取得了平衡。

Principle of Beam Search波束搜索原理

  1. Initialization:初始化:

    • Start with an initial state, which could be the beginning of a sentence, or in the case of speech recognition, the start of a hypothesis for the spoken words.从初始状态开始,它可以是句子的开头,或者在语音识别的情况下,可以是口语单词假设的开始。
  2. Expansion:扩张:

    • Expand all possible next states (or nodes) from the current states.从当前状态展开所有可能的下一个状态(或节点)。
  3. Scoring:评分:

    • Assign a score to each expanded state based on a heuristic or probability. In speech recognition, this score often combines acoustic model scores, language model scores, and possibly other factors.根据启发式或概率为每个扩展状态分配分数。在语音识别中,该分数通常结合声学模型分数、语言模型分数以及可能的其他因素。
  4. Pruning:修剪:

    • Retain only the top \(k\) states with the highest scores, where \(k\) is the beam width. Discard the rest.仅保留得分最高的前 \(k\) 状态,其中 \(k\) 是波束宽度。丢弃其余的。
  5. Iteration:迭代:

    • Repeat the expansion, scoring, and pruning steps until a termination condition is met, such as reaching the end of the input or a predefined number of iterations.重复扩展、评分和剪枝步骤,直到满足终止条件,例如到达输入末尾或预定义的迭代次数。
  6. Termination and Selection:终止和选择:

    • Once the search is terminated, the best scoring state is chosen as the final result.一旦搜索终止,则选择最佳评分状态作为最终结果。

Example of Beam Search in Speech Recognition语音识别中的束搜索示例

Consider a simplified example of recognizing a sequence of phonemes from an acoustic input. Suppose the beam width is set to 3, meaning we will keep the top 3 hypotheses at each step.考虑一个从声学输入中识别音素序列的简化示例。假设波束宽度设置为 3,这意味着我们将在每一步保留前 3 个假设。

Step-by-Step Illustration:分步说明:

  1. Initialization:初始化:

    • Start with an empty hypothesis: ""从一个空的假设开始: ""
    • Current hypotheses: [("")]当前假设: [("")]
  2. First Expansion:第一次扩展:

    • Possible phonemes for the first segment of the input: [a, b, c]输入第一段的可能音素: [a, b, c]
    • Expanded hypotheses: ["a", "b", "c"]扩展假设: ["a", "b", "c"]
    • Scores for these hypotheses (assuming arbitrary scores for illustration): {"a": 0.7, "b": 0.6, "c": 0.5}这些假设的分数(假设任意分数用于说明): {"a": 0.7, "b": 0.6, "c": 0.5}
  3. First Pruning:第一次修剪:

    • Keep the top 3 hypotheses: ["a", "b", "c"]保留前 3 个假设: ["a", "b", "c"]
    • Current hypotheses: ["a", "b", "c"]当前假设: ["a", "b", "c"]
  4. Second Expansion:第二次扩展:

    • For each of the current hypotheses, expand to the next possible phonemes:对于当前的每个假设,扩展到下一个可能的音素:
      • "a" expands to: ["aa", "ab", "ac"] "a" 扩展为: ["aa", "ab", "ac"]
      • "b" expands to: ["ba", "bb", "bc"] "b" 扩展为: ["ba", "bb", "bc"]
      • "c" expands to: ["ca", "cb", "cc"] "c" 扩展为: ["ca", "cb", "cc"]
    • Hypotheses and their scores (assume combined scores for the expanded sequences):假设及其分数(假设扩展序列的综合分数):
      • {"aa": 0.5, "ab": 0.4, "ac": 0.3, "ba": 0.6, "bb": 0.55, "bc": 0.45, "ca": 0.4, "cb": 0.35, "cc": 0.25}
  5. Second Pruning:第二次修剪:

    • Keep the top 3 hypotheses: ["ba", "bb", "aa"]保留前 3 个假设: ["ba", "bb", "aa"]
    • Current hypotheses: ["ba", "bb", "aa"]当前假设: ["ba", "bb", "aa"]
  6. Third Expansion:第三次扩展:

    • Continue expanding the top 3 hypotheses:继续扩展前 3 个假设:
      • "ba" expands to: ["baa", "bab", "bac"] "ba" 扩展为: ["baa", "bab", "bac"]
      • "bb" expands to: ["bba", "bbb", "bbc"] "bb" 扩展为: ["bba", "bbb", "bbc"]
      • "aa" expands to: ["aaa", "aab", "aac"] "aa" 扩展为: ["aaa", "aab", "aac"]
    • Hypotheses and their scores:假设及其分数:
      • {"baa": 0.4, "bab": 0.35, "bac": 0.3, "bba": 0.45, "bbb": 0.4, "bbc": 0.35, "aaa": 0.3, "aab": 0.25, "aac": 0.2}
  7. Third Pruning:第三次修剪:

    • Keep the top 3 hypotheses: ["bba", "baa", "bbb"]保留前 3 个假设: ["bba", "baa", "bbb"]
    • Current hypotheses: ["bba", "baa", "bbb"]当前假设: ["bba", "baa", "bbb"]
  8. Termination:终止:

    • Assume we stop here or continue until the entire input is processed.假设我们在此停止或继续,直到处理完整个输入。
    • Choose the hypothesis with the highest score: "bba".选择得分最高的假设: "bba"

Key Points of Beam Search:波束搜索的要点:

  • Efficiency: By limiting the number of hypotheses (beam width), Beam Search significantly reduces the computational complexity compared to exhaustive search methods.效率:通过限制假设的数量(波束宽度),与穷举搜索方法相比,波束搜索显着降低了计算复杂性。
  • Balance: It balances between exploring many possible paths (as in breadth-first search) and diving deep into promising paths (as in depth-first search).平衡:它在探索许多可能的路径(如广度优先搜索)和深入研究有希望的路径(如深度优先搜索)之间取得平衡。
  • Heuristics: The quality of the results heavily depends on the scoring heuristic used to evaluate the hypotheses.启发式:结果的质量在很大程度上取决于用于评估假设的评分启发式。

Beam Search is effective in speech recognition because it manages the trade-off between accuracy and computational cost, making it suitable for real-time applications.波束搜索在语音识别中非常有效,因为它可以在准确性和计算成本之间进行权衡,使其适合实时应用。

3. viterbi 算法

The Viterbi algorithm is a dynamic programming algorithm used to find the most likely sequence of hidden states (or path) that results in a sequence of observed events. It is commonly used in Hidden Markov Models (HMMs) for speech recognition, bioinformatics, and other applications where modeling temporal sequences is essential.维特比算法是一种动态规划算法,用于查找最有可能产生一系列观察到的事件的隐藏状态(或路径)序列。它通常用于语音识别、生物信息学和其他需要对时间序列进行建模的应用的隐马尔可夫模型 (HMM)。

Principle of the Viterbi Algorithm维特比算法原理

The Viterbi algorithm operates on the principle of finding the maximum probability path through a series of states, given a sequence of observations. It does this by breaking down the problem into smaller subproblems and solving them iteratively.维特比算法的运行原理是在给定一系列观察值的情况下找到通过一系列状态的最大概率路径。它通过将问题分解为更小的子问题并迭代解决它们来实现这一点。

Components of the Viterbi Algorithm维特比算法的组成部分

  1. States: The possible hidden states in the model.状态:模型中可能的隐藏状态。
  2. Observations: The observed sequence of events.观察:观察到的事件序列。
  3. Initial Probabilities (\(\pi\)): The probabilities of starting in each state.初始概率 ( \(\pi\) ):每个状态开始的概率。
  4. Transition Probabilities (A): The probabilities of transitioning from one state to another.转换概率 (A):从一种状态转换到另一种状态的概率。
  5. Emission Probabilities (B): The probabilities of observing a particular observation from a state.发射概率 (B):从某个状态观察到特定观测值的概率。

Steps of the Viterbi Algorithm维特比算法的步骤

  1. Initialization:初始化:

    • Set up the initial probabilities for each state based on the first observation.根据第一次观察设置每个状态的初始概率。
    • \(\delta_1(i) = \pi_i \cdot B_i(O_1)\)
    • \(\psi_1(i) = 0\)
  2. Recursion:递归:

    • For each subsequent observation, update the probabilities and keep track of the most likely previous state.对于每个后续观察,更新概率并跟踪最可能的先前状态。
    • \(\delta_t(j) = \max_{i} [\delta_{t-1}(i) \cdot A_{ij}] \cdot B_j(O_t)\)
    • \(\psi_t(j) = \arg\max_{i} [\delta_{t-1}(i) \cdot A_{ij}]\)
  3. Termination:终止:

    • Identify the final state with the highest probability.确定概率最高的最终状态。
    • \(P^* = \max_{i} \delta_T(i)\)
    • \(q_T^* = \arg\max_{i} \delta_T(i)\)
  4. Path Backtracking:路径回溯:

    • Backtrack through the \(\psi\) matrix to find the most likely state sequence.通过 \(\psi\) 矩阵回溯以找到最可能的状态序列。
    • For \(t = T-1, T-2, \ldots, 1\):对于 \(t = T-1, T-2, \ldots, 1\)
      • \(q_t^* = \psi_{t+1}(q_{t+1}^*)\)

Example of the Viterbi Algorithm维特比算法示例

Let's consider a simple example to illustrate the Viterbi algorithm. Suppose we have a weather prediction model with two states: "Sunny" (S) and "Rainy" (R), and we observe the weather over three days.让我们考虑一个简单的例子来说明维特比算法。假设我们有一个具有两种状态的天气预报模型:“晴天”(S)和“雨天”(R),并且我们观察三天的天气。

Model Parameters:型号参数:

  • States: S, R状态:S、R
  • Observations: "Dry" (D), "Wet" (W)观察结果:“干”(D)、“湿”(W)
  • Initial Probabilities (\(\pi\)):初始概率( \(\pi\) ):
    • \(\pi(S) = 0.6\)
    • \(\pi(R) = 0.4\)
  • Transition Probabilities (A):转移概率 (A):
    • \(A(S \rightarrow S) = 0.7\)
    • \(A(S \rightarrow R) = 0.3\)
    • \(A(R \rightarrow S) = 0.4\)
    • \(A(R \rightarrow R) = 0.6\)
  • Emission Probabilities (B):排放概率 (B):
    • \(B(S \rightarrow D) = 0.8\)
    • \(B(S \rightarrow W) = 0.2\)
    • \(B(R \rightarrow D) = 0.1\)
    • \(B(R \rightarrow W) = 0.9\)

Observed Sequence:观察到的序列:

  • Day 1: Dry (D)第一天:干地(D)
  • Day 2: Wet (W)第 2 天:湿地(W)
  • Day 3: Dry (D)第 3 天:干地(D)

Viterbi Algorithm Steps维特比算法步骤

Initialization:初始化:

  • \(\delta_1(S) = \pi(S) \cdot B(S \rightarrow D) = 0.6 \cdot 0.8 = 0.48\)
  • \(\delta_1(R) = \pi(R) \cdot B(R \rightarrow D) = 0.4 \cdot 0.1 = 0.04\)
  • \(\psi_1(S) = 0\)
  • \(\psi_1(R) = 0\)

Recursion:递归:

  • Day 2 (Wet):第二天(湿):

    • \(\delta_2(S) = \max[\delta_1(S) \cdot A(S \rightarrow S), \delta_1(R) \cdot A(R \rightarrow S)] \cdot B(S \rightarrow W)\)
    • \(\delta_2(S) = \max[0.48 \cdot 0.7, 0.04 \cdot 0.4] \cdot 0.2 = \max[0.336, 0.016] \cdot 0.2 = 0.336 \cdot 0.2 = 0.0672\)
    • \(\delta_2(R) = \max[\delta_1(S) \cdot A(S \rightarrow R), \delta_1(R) \cdot A(R \rightarrow R)] \cdot B(R \rightarrow W)\)
    • \(\delta_2(R) = \max[0.48 \cdot 0.3, 0.04 \cdot 0.6] \cdot 0.9 = \max[0.144, 0.024] \cdot 0.9 = 0.144 \cdot 0.9 = 0.1296\)
    • \(\psi_2(S) = \arg\max[\delta_1(S) \cdot A(S \rightarrow S), \delta_1(R) \cdot A(R \rightarrow S)]\)
    • \(\psi_2(S) = \arg\max[0.336, 0.016] = S\)
    • \(\psi_2(R) = \arg\max[\delta_1(S) \cdot A(S \rightarrow R), \delta_1(R) \cdot A(R \rightarrow R)]\)
    • \(\psi_2(R) = \arg\max[0.144, 0.024] = S\)
  • Day 3 (Dry):第三天(干):

    • \(\delta_3(S) = \max[\delta_2(S) \cdot A(S \rightarrow S), \delta_2(R) \cdot A(R \rightarrow S)] \cdot B(S \rightarrow D)\)
    • \(\delta_3(S) = \max[0.0672 \cdot 0.7, 0.1296 \cdot 0.4] \cdot 0.8 = \max[0.04704, 0.05184] \cdot 0.8 = 0.05184 \cdot 0.8 = 0.041472\)
    • \(\delta_3(R) = \max[\delta_2(S) \cdot A(S \rightarrow R), \delta_2(R) \cdot A(R \rightarrow R)] \cdot B(R \rightarrow D)\)
    • \(\delta_3(R) = \max[0.0672 \cdot 0.3, 0.1296 \cdot 0.6] \cdot 0.1 = \max[0.02016, 0.07776] \cdot 0.1 = 0.07776 \cdot 0.1 = 0.007776\)
    • \(\psi_3(S) = \arg\max[\delta_2(S) \cdot A(S \rightarrow S), \delta_2(R) \cdot A(R \rightarrow S)]\)
    • \(\psi_3(S) = \arg\max[0.04704, 0.05184] = R\)
    • \(\psi_3(R) = \arg\max[\delta_2(S) \cdot A(S \rightarrow R), \delta_2(R) \cdot A(R \rightarrow R)]\)
    • \(\psi_3(R) = \arg\max[0.02016, 0.07776] = R\)

Termination:终止:

  • \(P^* = \max[\delta_3(S), \delta_3(R)] = \max[0.041472, 0.007776] = 0.041472\)
  • \(q_3^* = \arg\max[\delta_3(S), \delta_3(R)] = S\)

Path Backtracking:路径回溯:

  • \(q_3^* = S\)
  • \(q_2^* = \psi_3(S) = R\)
  • \(q_1^* = \psi_2(R) = S\)

Result:结果:

  • The most likely sequence of states is: S -> R -> S最可能的状态序列是: S -> R -> S
  • This means the weather was most likely "Sunny" on Day 1, "Rainy" on Day 2, and "Sunny" on Day 3.这意味着第一天的天气很可能是“晴天”,第二天是“雨天”,第三天是“晴天”。

Summary概括

The Viterbi algorithm efficiently computes the most likely sequence of hidden states in an HMM given an observed sequence. It does this by maintaining and updating probabilities for each state at each time step, while also keeping track of the path through the state space that leads to these probabilities. This method ensures that the final sequence of states is the most probable given the observations.给定观察序列,维特比算法可以有效地计算 HMM 中最可能的隐藏状态序列。它通过在每个时间步维护和更新每个状态的概率来实现这一点,同时还跟踪导致这些概率的状态空间路径。该方法确保最终的状态序列是给定观察结果的最可能的状态。

posted on 2024-06-20 21:19  Hello_zhengXinTang  阅读(211)  评论(0)    收藏  举报