Abogen - 强大的文本转语音工具，轻松将电子书转换为高质量有声书

项目概述

Abogen是一款基于Kokoro-82M模型的文本转语音工具，能够将EPUB、PDF或文本文件快速转换为高质量音频并生成同步字幕。它专为有声书制作、视频配音等场景设计，具有以下核心优势：

支持多种输入格式：EPUB、PDF、TXT
生成高质量音频及同步字幕
多语言支持（包括英语、中文、日语等）
可自定义语音风格和参数

功能特性

多格式支持：处理EPUB、PDF和纯文本文件
智能章节检测：自动识别EPUB中的章节结构
高质量音频输出：支持WAV、MP3、OPUS、M4B、FLAC等多种格式
同步字幕生成：生成SRT、ASS等格式的字幕文件
多语言支持：包括英语、中文、日语、西班牙语等
语音混合功能：可自定义混合不同语音特征
GPU加速：支持CUDA加速处理
跨平台：支持Windows、Linux和macOS系统

安装指南

通过pip安装

pip install abogen

系统要求

Python 3.7或更高版本
FFmpeg（用于音频格式转换）
推荐使用NVIDIA GPU以获得最佳性能（支持CUDA）

可选依赖

如需使用GPU加速，请安装对应版本的PyTorch：

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

使用说明

基本使用

启动Abogen GUI界面
选择输入文件（EPUB/PDF/TXT）
设置语音参数（语言、声音风格等）
选择输出格式和字幕选项
开始转换

代码示例

from abogen import convert_file

# 转换EPUB文件为有声书
convert_file(
    input_path="book.epub",
    output_format="mp3",
    lang_code="z",  # 中文
    voice="zf_xiaoxiao",  # 使用"小小"语音
    subtitle_mode="srt"  # 生成SRT字幕
)

高级功能

Abogen支持语音混合功能，可通过公式混合不同语音特征：

# 混合两种语音特征
voice_formula = "am_echo*0.7 + am_onyx*0.3"

核心代码解析

文件处理模块

class HandlerDialog(QDialog):
    """处理EPUB/PDF文件的对话框"""
    
    def __init__(self, book_path, file_type=None, checked_chapters=None, parent=None):
        super().__init__(parent)
        self.file_type = file_type or (
            "pdf" if book_path.lower().endswith(".pdf") else "epub"
        )
        self.book_path = book_path
        self.setWindowTitle(
            f'Select {"Chapters" if self.file_type == "epub" else "Pages"} - {book_name}'
        )
        # 初始化UI和章节处理逻辑

语音处理引擎

def parse_voice_formula(pipeline, formula):
    """解析语音混合公式"""
    if not formula.strip():
        raise ValueError("Empty voice formula")

    weighted_sum = None
    total_weight = calculate_sum_from_formula(formula)

    for term in formula.split("+"):
        voice_name, weight = term.strip().split("*")
        weight = float(weight.strip()) / total_weight
        voice_tensor = pipeline.load_single_voice(voice_name.strip())
        
        if weighted_sum is None:
            weighted_sum = weight * voice_tensor
        else:
            weighted_sum += weight * voice_tensor

    return weighted_sum

队列管理系统

@dataclass
class QueuedItem:
    """表示待处理项目的结构"""
    file_name: str
    lang_code: str
    speed: float
    voice: str
    save_option: str
    output_folder: str
    subtitle_mode: str
    output_format: str
    total_char_count: int
    replace_single_newlines: bool = False

Abogen通过模块化设计实现了高效的文件处理、语音合成和任务管理功能，为用户提供了简单易用的文本转语音解决方案。
更多精彩内容请关注我的个人公众号公众号（办公AI智能小助手）
公众号二维码

posted @ 2025-08-13 18:01 qife 阅读(15) 评论(0) 收藏举报

刷新页面返回顶部

qife122