3月14日学习进度

本地部署
3.1 环境准备
首先准备 F5-TTS 环境：

git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
conda create -n f5tts python=3.10 -y
conda activate f5tts
pip install -r requirements.txt
1
2
3
4
5
然后，把模型下载到本地，方便调用：

export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download SWivid/F5-TTS --local-dir ckpts/
huggingface-cli download charactr/vocos-mel-24khz --local-dir ckpts/vocos
1
2
3
3.2 推理测试
先来测试下单音色语音克隆：

python inference-cli.py \
--model "F5-TTS" \
--ref_audio "tests/ref_audio/test_en_1_ref_short.wav" \
--ref_text "Some call me nature, others call me mother nature." \
--gen_text "I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences." \
--load_vocoder_from_local
1
2
3
4
5
6
相关参数说明：

–model 代表对应的模型，这里指定 F5-TTS;
–ref_audio 待克隆的音频；
–ref_text 待克隆的音频对应的文本，如果不提供的话会默认下载 openai/whisper-large-v3-turbo 进行语音识别；
–gen_text 希望合成的文本;
多音色克隆：

python inference-cli.py -c samples/story.toml
项目采用 tomli 管理配置信息，我们来看下配置文件，这里的 voices.town 用来指定不同的音色：

posted on 2025-03-14 16:32 leapss 阅读(25) 评论(0) 收藏举报