python比较两个文件，输出HTML报告

目的

本脚本的作用是从SVN下载两份文件，一个新版本，一个旧版本，然后进行比较，再按照html格式输出比较信息。在进行软件开发时，很常用。

这里有两种比较方式，一种是使用python的difflib库实现，一种是直接调用BeyondCompare可执行文件来执行BeyondComapre脚本。

目的结构

目录结构

$ tree
.
├── config					# 配置文件
│   ├── bcomp.script		# BeyondComapre脚本, 具体语法参考BeyondComapre帮助文件
│   ├── compare_files.in	# 需要从SVN下载的文件，也是进行比较的文件
│   └── config.ini			# 配置SVN服务器信息，以及BeyondComapre可执行文件路径
├── main_diff.py
├── output	# 比较文件输出目录
├── r39260	# 这个是自动创建的目录，用于存放从SVN下载的文件
├── r39940	# 这个是自动创建的目录，用于存放从SVN下载的文件
└── ReadMe.md

源文件

config\bcomp.script：

text-report layout:side-by-side &
  options:ignore-unimportant,display-context,line-numbers &
  output-to:"%3" output-options:html-color "%1" "%2"

config\config.ini:

[SvnServer]
url = svn server ip
old_version = old version, such as r39260
new_version = new version, such as r39940
username = svn user name
password = svn user password

[BeyondCompare]
path = C:\\Program Files\\Beyond Compare 4\\bcomp.exe

main_diff.py：

# -*- encoding:utf-8 -*-
import os
import difflib
import logging
import configparser
import subprocess

logging.basicConfig(level=logging.NOTSET, format='[%(filename)s:%(lineno)d]-%(levelname)s %(message)s')

def delete_all_files_in_directory(dir_path):
    for filename in os.listdir(dir_path):
        file_path = os.path.join(dir_path, filename)
        try:
            if os.path.isfile(file_path) or os.path.islink(file_path):
                os.unlink(file_path)
            elif os.path.isdir(file_path):
                print(f"跳过目录: {file_path}")
        except Exception as e:
            print(f"删除失败: {file_path}, 错误: {e}")


class Gen_diff_html():
    def __init__(self):
        filter_files = ['temp.h', 'temp.c']	# 需要过滤的文件
        config = configparser.ConfigParser()
        config.read('config/config.ini')
        # BeyondCompare信息
        self.bcomp_path = config.get('BeyondCompare', 'path')
        # SVN信息
        self.svn_server = config.get('SvnServer', 'url')
        self.old_version = config.get('SvnServer', 'old_version')
        self.new_version = config.get('SvnServer', 'new_version')
        self.username = config.get('SvnServer', 'username')
        self.password = config.get('SvnServer', 'password')
        self.output_file = f'output'
        if not os.path.exists(self.output_file):
            os.makedirs(self.output_file)
        else:
            delete_all_files_in_directory(self.output_file)
        if not os.path.exists(self.old_version):
            os.makedirs(self.old_version)
        else:
            delete_all_files_in_directory(self.old_version)
        if not os.path.exists(self.new_version):
            os.makedirs(self.new_version)
        else:
            delete_all_files_in_directory(self.new_version)
        self.check_files = []
        with open('config/compare_files.in', 'r') as f:
            lines = f.readlines()
            for line in lines:
                file_path = line.strip().replace('\n', '').replace('\r', '')
                file_name = file_path.split('/')[-1].strip()
                if file_path == '' or file_name in filter_files:
                    continue
                self.check_files.append(file_path)


    def svn_checkout(self):
        for version in [self.old_version, self.new_version]:
            for file_path in self.check_files:
                file_name = file_path.split('/')[-1].strip()
                command_args = ["svn", "export", f"-{version}", self.svn_server + '/' + file_path, f"{version}/{file_name}"]
                command_args.extend(["--username", self.username, "--password", self.password])
                try:
                    result = subprocess.run(
                        command_args,
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE,
                        encoding="utf-8",
                        check=True
                    )
                    # logging.debug(result.stdout)
                    logging.debug(f'svn export {file_path} done')
                except subprocess.CalledProcessError as e:
                    logging.error(e.stdout)
                    logging.error(e.stderr)
                    return e.stdout, e.stderr


    def generate_html_by_difflib(self):
        for file_path in self.check_files:
            file_name = file_path.split('/')[-1].strip()
            file1 = f'{self.old_version}/{file_name}'
            file2 = f'{self.new_version}/{file_name}'
            with open(file1, 'r', encoding='utf-8') as f1, \
                 open(file2, 'r', encoding='utf-8') as f2:
                lines1 = f1.readlines()
                lines2 = f2.readlines()
            html_diff = difflib.HtmlDiff().make_file(lines1, lines2, fromdesc=file1, todesc=file2)
            html_diff = html_diff.replace('<head>', '<head>\n<meta charset="UTF-8">')
            with open(f'{self.output_file}/{file_name}.html', 'w', encoding='utf-8') as f:
                f.write(html_diff)
            logging.debug(f'Compare {file_path} done')


    def generate_html_by_bcomp(self):
        for file_path in self.check_files:
            file_name = file_path.split('/')[-1].strip()
            file1 = f'{self.old_version}/{file_name}'
            file2 = f'{self.new_version}/{file_name}'
            args = [self.bcomp_path, "@config/bcomp.script", "/silent", file1, file2, f'{self.output_file}/{file_name}.html']
            try:
                subprocess.run(args, check=True)
                logging.debug(f'Compare {file_path} done')
            except subprocess.CalledProcessError as e:
                logging.error(f"脚本执行失败: {e}")


if __name__ == '__main__':
    gen_diff = Gen_diff_html()
    gen_diff.svn_checkout()
    gen_diff.generate_html_by_difflib()	# 使用python的difflib库进行比较
    # gen_diff.generate_html_by_bcomp() # 使用BeyondCompare进行比较，和python的difflib库二选一

BeyondCompare脚本扩展

脚本

运行一个脚本

进入BeyondCompare安装路径，一般为C:\Program Files\Beyond Compare 4，安装路径下有一个可执行文件BCompare.exe
打开命令行终端，执行命令BCompare.exe @"C:\My Folder\My Script.txt"
这里涉及到一个脚本文件My Script.txt，这个文件需要我们自己编写

注意：在 PowerShell 中，@ 符号默认被解析为 here-string 的起始符（例如 @"... 用于定义多行字符串）。当你在命令行中直接使用 @ 符号时（如 @"My Script.txt"），PowerShell 会误认为这是 here-string 的语法，导致语法错误。可以通过 转义 @ 符号 或 调整参数格式，确保 PowerShell 正确解析 Beyond Compare 的命令行参数。以下是具体方法：

在 @ 符号前添加反引号 ```，告诉 PowerShell 这不是 here-string 的起始符：

& "G:\Program Files\Beyond Compare 4\BCompare.exe" `@"My Script.txt" /silent "My File.txt" "Your File.txt" "My Report.txt"

创建一个脚本

脚本是逐行处理的，每行只有一个命令。
脚本不区分大小写，并且忽略空行和注释（“#”之后的任何内容）。
参数之间用空格分隔。若要在参数中包含空格，请将参数用引号（"）括起来。
换行方式：若要在第二（或第三）行继续执行长命令，请在除最后一行以外的每一行末尾添加&号。
可以在命令行传入参数，然后在脚本中引用，使用方式为通过插入百分号（%）后跟1-9中的单个数字来引用。1-9列表中不包含脚本名称和以/开头的命令行开关。比如，给定命令行：
```
BCompare.exe @"My Script.txt" /silent "My Session"
```
参数"My Session"在脚本中可以通过下面的方式加载：
```
load "%1"
```
插入环境变量时，可以将其名称用百分号括起来。环境变量的字符大小写必须正确。系统的临时文件夹可以通过在脚本中添加以下内容来加载：
```
load "%TMP%"
```

脚本命令行switches

影响脚本命令行Switches的选项有：/closescript, /leftreadonly, /readonly, /rightreadonly, /silent

通常，脚本处理在任务栏上添加一个条目，并显示一个脚本状态窗口，详细说明其进度和任何错误。

/silent开关绕过任务栏条目和脚本状态窗口，允许Beyond Compare以不可见的方式处理脚本，这个比较常用。

一个简单脚本

下面是一些示例，说明如何编写一个脚本。

通过名字比较两个文件

目的：通过名字比较两个文件

file-report layout:side-by-side &
  options:display-mismatches &
  output-to:"%3" "%1" "%2"

通过命令行调用这个脚本：

BCompare.exe @"My Script.txt" "My File.txt" "Your File.txt" "My Report.txt"

注意：

行尾的&符号表示命令继续到下一行
%1、%2、%3分别指代第一个、第二个和第三个命令行参数。

PowerShell 执行命令：

& "G:\Program Files\Beyond Compare 4\BCompare.exe" `@"My Script.txt" /silent "r39260/prbs.c" "r39940/prbs.c" "My Report.txt"

创建一个报告

这个脚本通过名称比较两个文件，并生成一个html报告，显示上下文的差异：

text-report layout:side-by-side &
  options:ignore-unimportant,display-context,,line-numbers &
  output-to:"%3" output-options:html-color "%1" "%2"

powershell执行命令：

& "G:\Program Files\Beyond Compare 4\BCompare.exe" `@"MyScript.txt" /silent "r39260/prbs.c" "r39940/prbs.c" "My Report.html"

比较两个目录，生成差异报告

这个脚本生成一个只有差异的报告：

# Set up basic comparison features.
criteria timestamp:2sec
# Filter out log files.
filter "-*.log"
# Load first comparison.
load "C:\My Folder" "C:\Your Folder"
# Compare files with timestamp differences.
select newer.files older.files
# Generate a report of the differences.
folder-report layout:summary options:display-mismatches output-to:"My Report.txt"

备份目录

可以使用Beyond Compare的脚本处理器来执行某些任务，例如同步文件夹，而无需交互。例如，要自动备份“C:\My Folder”，脚本编写如下：

load "C:\My Folder" "C:\My Backups"
expand all
select left.newer.files left.orphan.files
copy left->right

注意：执行脚本前，需要保证这两个目录存在"C:\My Folder" "C:\My Backups"

脚本语法

脚本是一个简单的文本文件，其中包含一系列命令，这些命令可以控制程序并自动执行文件操作或生成报告。

符号	含义
\|	划分可能的选择
()	包围必须的表达式
[]	包围可选的表达式
< >	包围需要用户文本的描述
[...]	跟在一个可以重复的表达式后面

您可以在任何命令中使用lt代替left或rt代替right。

以下是脚本命令（按字母顺序排列）：

ATTRIB BEEP COLLAPSE COMPARE COPY COPYTO CRITERIA DATA-REPORT DELETE EXPAND FILE-REPORT FILTER FOLDER-REPORT HEX-REPORT LOAD LOG MOVE MOVETO MP3-REPORT OPTION PICTURE-REPORT REGISTRY-REPORT RENAME SELECT SNAPSHOT SYNC TEXT-REPORT TOUCH VERSION-REPORT

text-report

用法：

text-report layout:<layout> [options:<options>] [title:<report title>] output-to:(printer|clipboard|<filename>) [output-options:<options>] [<comparison>]

作用：生成当前选定文件的文本比较报告

参数说明：

layout：控制报告的呈现方式，支持配置为：side-by-side, summary, interleaved, patch, statistics，xml
options：可选项，可以在每种layout下使用，支持下面的参数：
- ignore-unimportant：将不重要文本的差异视为匹配，可用于所有布局
- display-all, display-mismatches, display-context，display-matches：可用于控制除摘要、补丁和统计之外的所有布局中包含比较的哪些行。缺省情况下，使用display-all。
- line-numbers：在side-by-side的layout下，显示行号
- strikeout-left-diffs：在interleaved 的layout下，划掉左边的差线。
- strikeout-right-diffs：在interleaved 的layout下，划掉右边的差线。
- patch-normal, patch-context，patch-unified：patch的layout的三种格式，默认使用patch-normal
title, output-to, output-options、<comparison>：见后面的章节[Common Report Arguments]

示例：

text-report layout:interleaved options:display-context &
    output-to:printer output-options:print-color,wrap-word

text-report layout:patch options:patch-unified &
    output-to:"My Report.txt"

Common report arguments

作用：报告命令（data-report、file-report、folder-report、hex-report、mp3-report、picture-report、registry-report、text-report和version-report）必须包含关于报告发送位置的信息，还可以包含关于输出格式的附加信息。

title：控制显示在报表顶部的标题
output-to：控制输出目标。它可以是printer, clipboard或文件名。
output-options：可选，对于每个输出目标的使用方式不同
- print-color or print-mono：颜色方案可用于printer输出。缺省情况下，使用print-mono。
- print-portrait or print-landscape：可用于printer输出方向。默认情况下，使用纵向打印。
- wrap-none, wrap-character or wrap-word：控制是否换行。printer输出可以使用这三个选项。HTML输出可以使用warp-none和warp-word。缺省情况下，使用wrap-none。
- html-color, html-mono or html-custom：需要输出HTML而不是纯文本。这些选项可用于剪贴板和文件输出。html-custom需要外部样式表的文件名或URL。
<comparison>：可以是会话名称，也可以是一对文件名。文件报告将使用指定的比较，而不是脚本中选择的文件。当使用保存的会话时，比较类型必须匹配报告类型（例如，Table Compare会话必须使用data-report或file-report）。

举例：

data-report layout:interleaved output-to:printer &
    output-options:print-color,print-landscape

file-report layout:summary output-to:clipboard &
    output-options:wrap-word,html-color

text-report layout:patch options:patch-unified &
    output-to:"My Report.txt"

posted @ 2025-03-05 16:56 Mrlayfolk 阅读(182) 评论(0) 收藏举报

刷新页面返回顶部

MrLayflolk

有志者，事竟成。