Tacotron2 Inference教程

https://www.dandelioncloud.cn/article/details/1601780566695559170

目录结构

本教程实验环境为Google Colab,文件目录结构如下

  1. ALL
  2. └── tacotron2
  3. ├── audio_processing.py
  4. ├── checkpoint_269000
  5. ├── data_utils.py
  6. ├── demo.wav
  7. ├── distributed.py
  8. ├── Dockerfile
  9. ├── filelists
  10. ├── ljs_audio_text_test_filelist.txt
  11. ├── ljs_audio_text_train_filelist.txt
  12. └── ljs_audio_text_val_filelist.txt
  13. ├── hparams.py
  14. ├── inference.ipynb
  15. ├── layers.py
  16. ├── LICENSE
  17. ├── logger.py
  18. ├── loss_function.py
  19. ├── loss_scaler.py
  20. ├── model.py
  21. ├── multiproc.py
  22. ├── plotting_utils.py
  23. ├── __pycache__
  24. ├── audio_processing.cpython-36.pyc
  25. ├── data_utils.cpython-36.pyc
  26. ├── distributed.cpython-36.pyc
  27. ├── hparams.cpython-36.pyc
  28. ├── layers.cpython-36.pyc
  29. ├── logger.cpython-36.pyc
  30. ├── loss_function.cpython-36.pyc
  31. ├── model.cpython-36.pyc
  32. ├── plotting_utils.cpython-36.pyc
  33. ├── stft.cpython-36.pyc
  34. ├── train.cpython-36.pyc
  35. └── utils.cpython-36.pyc
  36. ├── README.md
  37. ├── requirements.txt
  38. ├── stft.py
  39. ├── tensorboard.png
  40. ├── text
  41. ├── cleaners.py
  42. ├── cmudict.py
  43. ├── __init__.py
  44. ├── LICENSE
  45. ├── numbers.py
  46. ├── __pycache__
  47. ├── cleaners.cpython-36.pyc
  48. ├── cmudict.cpython-36.pyc
  49. ├── __init__.cpython-36.pyc
  50. ├── numbers.cpython-36.pyc
  51. └── symbols.cpython-36.pyc
  52. └── symbols.py
  53. ├── train.py
  54. ├── utils.py
  55. └── waveglow
  56. ├── config.json
  57. ├── convert_model.py
  58. ├── denoiser.py
  59. ├── distributed.py
  60. ├── glow_old.py
  61. ├── glow.py
  62. ├── inference.py
  63. ├── LICENSE
  64. ├── mel2samp.py
  65. ├── __pycache__
  66. ├── denoiser.cpython-36.pyc
  67. └── glow.cpython-36.pyc
  68. ├── README.md
  69. ├── requirements.txt
  70. ├── tacotron2
  71. ├── train.py
  72. ├── waveglow_256channels_universal_v5.pt
  73. └── waveglow_logo.png

文件准备

首先请读者创建一个名为ALL的空文件夹,通过git clone https://github.com/NVIDIA/tacotron2.git命令将tacotron2完整的代码文件下载下来。此时ALL文件夹里面会多出一个名为tacotron2的文件夹,在这个文件夹里有一个inference.ipynb文件,就是等会要用到的推理部分的代码

接着将预训练好的WaveGlow模型保存到waveglow文件夹中(该模型名为waveglow_256channels_universal_v5.pt

最后还需要一个最重要的文件,就是tacotron2训练时保存的模型文件,一般在训练过程中,它会自动命名为checkpoint_xxxx,将其放到tacotron2文件夹下。如果你自己没有训练tacotron2,官方也提供了一个训练好的模型文件

修改Inference代码

再次强调,我的实验环境是Colab,以下内容均为,文字解释在上,对应代码在下

首先需要确保tensorflow版本为1.x,否则会报错

  1. %tensorflow_version 1.x
  2. import tensorflow as tf
  3. tf.__version__

然后进入ALL/tacotron2目录

  1. %cd ALL/tacotron2

执行代码前需要确保已经安装了unidecode

  1. !pip install unidecode

导入库,定义函数

  1. import matplotlib
  2. %matplotlib inline
  3. import matplotlib.pylab as plt
  4. import IPython.display as ipd
  5. import sys
  6. sys.path.append('waveglow/')
  7. import numpy as np
  8. import torch
  9. from hparams import create_hparams
  10. from model import Tacotron2
  11. from layers import TacotronSTFT, STFT
  12. from audio_processing import griffin_lim
  13. from train import load_model
  14. from text import text_to_sequence
  15. from denoiser import Denoiser
  16. def plot_data(data, figsize=(16, 4)):
  17. fig, axes = plt.subplots(1, len(data), figsize=figsize)
  18. for i in range(len(data)):
  19. axes[i].imshow(data[i], aspect='auto', origin='bottom',
  20. interpolation='none')
  21. hparams = create_hparams()
  22. hparams.sampling_rate = 21050 # 该参数会影响生成语音的语速,越大则语速越快
  23. checkpoint_path = "checkpoint_269000"
  24. model = load_model(hparams)
  25. model.load_state_dict(torch.load(checkpoint_path)['state_dict'])
  26. _ = model.cuda().eval().half()

接着进入waveglow目录加载waveglow模型

  1. %cd ALL/tacotron2/waveglow
  2. waveglow_path = 'waveglow_256channels_universal_v5.pt'
  3. waveglow = torch.load(waveglow_path)['model']
  4. waveglow.cuda().eval().half()
  5. for k in waveglow.convinv:
  6. k.float()
  7. denoiser = Denoiser(waveglow)

输入文本

  1. text = "WaveGlow is really awesome!"
  2. sequence = np.array(text_to_sequence(text, ['english_cleaners']))[None, :]
  3. sequence = torch.autograd.Variable(
  4. torch.from_numpy(sequence)).cuda().long()

生成梅尔谱输出,以及画出attention图

  1. mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
  2. plot_data((mel_outputs.float().data.cpu().numpy()[0],
  3. mel_outputs_postnet.float().data.cpu().numpy()[0],
  4. alignments.float().data.cpu().numpy()[0].T))

使用waveglow将梅尔谱合成为语音

  1. with torch.no_grad():
  2. audio = waveglow.infer(mel_outputs_postnet, sigma=0.666)
  3. ipd.Audio(audio[0].data.cpu().numpy(), rate=hparams.sampling_rate)

(可选)移除waveglow的bias

  1. audio_denoised = denoiser(audio, strength=0.01)[:, 0]
  2. ipd.Audio(audio_denoised.cpu().numpy(), rate=hparams.sampling_rate)
posted on 2024-02-13 21:51  独上兰舟1  阅读(31)  评论(0编辑  收藏  举报