本地部署
3.1 环境准备
首先准备 F5-TTS 环境:
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
conda create -n f5tts python=3.10 -y
conda activate f5tts
pip install -r requirements.txt
1
2
3
4
5
然后,把模型下载到本地,方便调用:
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download SWivid/F5-TTS --local-dir ckpts/
huggingface-cli download charactr/vocos-mel-24khz --local-dir ckpts/vocos
1
2
3
3.2 推理测试
先来测试下单音色语音克隆:
python inference-cli.py \
--model "F5-TTS" \
--ref_audio "tests/ref_audio/test_en_1_ref_short.wav" \
--ref_text "Some call me nature, others call me mother nature." \
--gen_text "I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences." \
--load_vocoder_from_local
1
2
3
4
5
6
相关参数说明:
–model 代表对应的模型,这里指定 F5-TTS;
–ref_audio 待克隆的音频;
–ref_text 待克隆的音频对应的文本,如果不提供的话会默认下载 openai/whisper-large-v3-turbo 进行语音识别;
–gen_text 希望合成的文本;
多音色克隆:
python inference-cli.py -c samples/story.toml
项目采用 tomli 管理配置信息,我们来看下配置文件,这里的 voices.town 用来指定不同的音色:
浙公网安备 33010602011771号