返回顶部

python 实现网页 pdf 转 docx

1、安装 python 库

pip3 install flask PyPDF2 python-docx

2、创建一个Flask应用,并编写处理文件上传和转换的代码

vim pdf_to_docx.py

import os
from flask import Flask, render_template, request, send_file
from PyPDF2 import PdfReader
from io import BytesIO
from docx import Document

app = Flask(__name__)

# 上传文件的HTML页面
@app.route('/')
def index():
    return render_template('index.html')

# 处理文件上传和转换
@app.route('/convert', methods=['POST'])
def convert():
    if 'file' not in request.files:
        return "No file part"

    file = request.files['file']
    if file.filename == '':
        return "No selected file"

    if file:
        pdf = PdfReader(file)
        doc = Document()
        for page_num in range(len(pdf.pages)):
            page = pdf.pages[page_num]
            doc.add_paragraph(page.extract_text())

        # 保存docx文件到内存中
        doc_buffer = BytesIO()
        doc.save(doc_buffer)
        doc_buffer.seek(0)
        download_basename = os.path.splitext(file.filename)[0]
        download_name = download_basename + '.docx'
        return send_file(doc_buffer, as_attachment=True, download_name=download_name)


if __name__ == '__main__':
    app.run(debug=True)

3、在HTML页面中添加文件上传表单和预览/下载功能

mkdir templates && cd templates
vim index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>PDF to DOCX Converter</title>
</head>
<body>
    <h1>PDF to DOCX Converter</h1>
    <form action="/convert" method="post" enctype="multipart/form-data">
        <input type="file" name="file" accept=".pdf">
        <button type="submit">Convert</button>
    </form>
</body>
</html>

整个目录树结构为:

.
├── pdf_to_docx.py
└── templates
    └── index.html

4、运行代码

python3 pdf_to_docx.py

image

打开本地浏览器访问http://127.0.0.1:5000 出现如下页面
image

选择本地文件然后点击Convert,即下载转换完成的 pdf 文件

posted @ 2024-04-24 15:45  十方央丶  阅读(6)  评论(0编辑  收藏  举报