解决方案

问题出现原因

Python使用openpyxl 3.1.5 读取Excel(.xlsx)，出现异常

使用Python读取其他人提供的Excel时，可能由于多方面原因导致出现这个错误，错误原因是Excel文件元数据的自定义属性(custom-properties)中某些字段包含None值导致的，也就是当前的excel是损坏的

临时简单解决方案

Excel另存为修复

通过打开Excel将该文件另存为，可以对Excel临时修复，但是多次使用openpyxl.load_workbook函数后，另存的文件也会报同样的错，又需要继续另存

无法根本解决问题

openpyxl版本降级到2.5.12

网络上寻找解决方案时找到的一种解决方案，当版本降级后不会再报这个异常，可以正常使用，不过语法与3.x有一定区别，如果还需要使用openpyxl 3.x的新特性，则降级无法解决

完美解决方案 —— Excel修复

在捕获到异常时，对Excel进行修复，修复成功后重新加载

通过以下代码创建临时目录将Excel在临时目录中进行修复，修复完成后加载到内存中，再删除临时目录

import openpyxl
import zipfile
import xml.etree.ElementTree as ET
import tempfile
import os
def _repair_and_load_excel(file_path, data_only=True):
    """
    修复损坏的Excel文件元数据
    """
    # 创建临时工作目录
    temp_dir = tempfile.mkdtemp(prefix="excel_repair_")

    try:
        # 1. 解压Excel文件（.xlsx本质是zip文件）
        with zipfile.ZipFile(file_path, 'r') as zip_ref:
            zip_ref.extractall(temp_dir)

        # 2. 修复custom.xml文件（自定义属性文件）
        custom_props_path = os.path.join(temp_dir, 'docProps', 'custom.xml')
        if os.path.exists(custom_props_path):
            try:
                tree = ET.parse(custom_props_path)
                root = tree.getroot()

                # 命名空间
                namespaces = {
                    '': 'http://schemas.openxmlformats.org/officeDocument/2006/custom-properties',
                    'vt': 'http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes'
                }

                # 移除空的属性
                for prop in root.findall('.//property', namespaces):
                    name_elem = prop.get('name')
                    if name_elem is None:
                        root.remove(prop)

                # 保存修复后的文件
                tree.write(custom_props_path, encoding='utf-8', xml_declaration=True)
            except:
                # 如果修复失败，直接删除custom.xml文件
                os.remove(custom_props_path)

        # 3. 重新压缩为Excel文件
        repaired_path = os.path.join(temp_dir, 'repaired.xlsx')
        with zipfile.ZipFile(repaired_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
            for root, dirs, files in os.walk(temp_dir):
                for file in files:
                    if file == 'repaired.xlsx':
                        continue
                    full_path = os.path.join(root, file)
                    arcname = os.path.relpath(full_path, temp_dir)
                    zipf.write(full_path, arcname)

        # 4. 加载修复后的文件
        return openpyxl.load_workbook(repaired_path, data_only=data_only)

原理

Excel文件(.xlsx)实际上是多个xml文件集合的压缩包，所以可以通过zipfile库对其解压修复再压缩还原

代码解析

   temp_dir = tempfile.mkdtemp(prefix="excel_repair_")

    with zipfile.ZipFile(file_path, 'r') as zip_ref:
        zip_ref.extractall(temp_dir)

解压后，其内部结构大致如下：

temp_dir/
├── [Content_Types].xml
├── _rels/
├── docProps/       # 文档属性
│   ├── app.xml
│   ├── core.xml
│   └── custom.xml  # 自定义属性（修复目标）
└── xl/            # 核心内容
    ├── worksheets/
    ├── sharedStrings.xml
    └── workbook.xml

修复custom.xml文件（核心部分）
- 找到对应custom.xml文件

        custom_props_path = os.path.join(temp_dir, 'docProps', 'custom.xml')
        if os.path.exists(custom_props_path):

解析XML文件

                tree = ET.parse(custom_props_path)
                root = tree.getroot()

配置对应命名空间

                namespaces = {
                    '': 'http://schemas.openxmlformats.org/officeDocument/2006/custom-properties',
                    'vt': 'http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes'
                }

在custom.xml文件中在custom-properties中表示自定义属性，可能存在其他地方表示不同的含义，这里确保找到的是自定义属性对应的，命名空间避免名称冲突，告诉解析器：“这个是来自custom-properties规范的那个”

根据对应报错TypeError: <class 'openpyxl.packaging.custom.StringProperty'>.name should be <class 'str'> but value is <class 'NoneType'> 表示name属性为None，将其移除

                for prop in root.findall('.//property', namespaces):
                    name_elem = prop.get('name')
                    if name_elem is None:
                        root.remove(prop)

保存修复的文件

    tree.write(custom_props_path, encoding='utf-8', xml_declaration=True)

重新压缩为xlsx文件

        repaired_path = os.path.join(temp_dir, 'repaired.xlsx')
        with zipfile.ZipFile(repaired_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
            for root, dirs, files in os.walk(temp_dir):
                for file in files:
                    if file == 'repaired.xlsx':
                        continue
                    full_path = os.path.join(root, file)
                    arcname = os.path.relpath(full_path, temp_dir)
                    zipf.write(full_path, arcname)

返回重新加载的结果

return openpyxl.load_workbook(repaired_path, data_only=data_only)

posted @ 2026-01-08 10:35 风陵南阅读(11) 评论(0) 收藏举报

刷新页面返回顶部

风陵南