Python difflib 比较序列差异

difflib — 用于计算差异的辅助工具 — Python 3.14.0 文档 - Python 文档

此模块提供用于比较序列的类和函数。它可被用于比较文件，并能以多种格式生成文件差异信息，包括 HTML、上下文（context）和统一（unified）差异格式。要比较目录和文件，另请参阅 filecmp 模块。

类

SequenceMatcher

用于比较任意类型的序列对，只要序列中的元素是可哈希的。

Differ

用于比较文本行序列并生成人类可读的差异或增量的类，内部使用 SequenceMatcher 来比较。

Differ 差异结果的每一行都以一个双字母代码开头：

代码	含义
`-`	序列1独有的行
`+`	序列2独有的行
`两个空格`	两个序列共有的行
`?`	两个序列都不存在的行（这种行是用于放置如上箭头 `^` ，下划线 `~~~` 这种提示你这块有差异的提示符）

HtmlDiff

用来创建一个 HTML 表格（或包含该表格的完整 HTML 文件），以并排、逐行的方式比较文本，并高亮显示行间和行内的变化。该表格可以以完整模式或上下文差异模式生成。

make_file(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5, *, charset='utf-8')

比较 fromlines 和 tolines（字符串列表），并返回一个字符串，该字符串是一个完整的 HTML 文件，其中包含一个表格，逐行显示差异，并高亮显示行间和行内的变化。fromdesc 和 todesc 是可选的关键字参数，用于指定 from/to 文件列标题字符串（两者都默认为空字符串）。context 和 numlines 都是可选的关键字参数。当需要显示上下文差异时，将 context 设置为 True，否则默认为 False 以显示完整文件。numlines 默认为 5。当 context 为 True 时，numlines 控制差异高亮周围的上下文行数。当 context 为 False 时，在使用“下一个”超链接时，numlines 控制差异高亮前显示的行数（设置为零将导致“下一个”超链接将下一个差异高亮放置在浏览器顶部，没有任何前导上下文）。

产生的示例输出如下：

函数

context_diff

context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')

比较 a 和 b（字符串列表），以上下文差异格式返回一个增量生成器。上下文差异格式的示例输出：

*** .\a.txt     2025-12-04T09:22:47.152087+08:00
--- .\b.txt     2025-12-04T09:23:00.877708+08:00
***************
*** 1,4 **** （这里是*打头，说明是a.txt的内容）
! * ndiff:    lists every line and highlights interline changes.
! * context:  highlights clusters of changes in a before/after format.
! * unified:  highlights clusters of changes in an inline format.
  * html:     generates side by side comparison with change highlights.
--- 1,3 ---- （这里是-打头，说明是b.txt的内容）
! * ngfdgdiff:    lisfga every lfdsfdsfdsfdsainterline changes.
  * html:     generates side by side comparison with change highlights.
+ * context:  higsadters of changes fdsfore/after format.

unified_diff

unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')

比较 a 和 b（字符串列表），以统一差异格式返回一个增量生成器。统一差异格式的示例输出：

--- .\a.txt     2025-12-04T09:22:47.152087+08:00
+++ .\b.txt     2025-12-04T09:23:00.877708+08:00
@@ -1,4 +1,3 @@
-* ndiff:    lists every line and highlights interline changes.
-* context:  highlights clusters of changes in a before/after format.
-* unified:  highlights clusters of changes in an inline format.
+* ngfdgdiff:    lisfga every lfdsfdsfdsfdsainterline changes.
 * html:     generates side by side comparison with change highlights.
+* context:  higsadters of changes fdsfore/after format.

ndiff

ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)

比较 a 和 b（字符串列表），返回一个 Differ 样式的增量生成器。示例输出（显然这里是 Differ 样式）：

+ * ngfdgdiff:    lisfga every lfdsfdsfdsfdsainterline changes.
- * ndiff:    lists every line and highlights interline changes.
- * context:  highlights clusters of changes in a before/after format.
- * unified:  highlights clusters of changes in an inline format.
  * html:     generates side by side comparison with change highlights.
+ * context:  higsadters of changes fdsfore/after format.

diff_bytes

diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n')

使用 dfunc 比较 a 和 b（字节对象列表）；以 dfunc 返回的格式生成一个差异行序列（也是字节）。dfunc 必须是一个可调用对象，通常是 unified_diff() 或 context_diff()。

允许您比较具有未知或不一致编码的数据。除 n 外的所有输入都必须是字节对象，而不是 str。它的工作原理是将所有输入（除 n 外）无损地转换为 str，然后调用 dfunc(a, b, fromfile, tofile, fromfiledate, tofiledate, n, lineterm)。然后将 dfunc 的输出转换回字节，因此您收到的差异行与 a 和 b 具有相同的未知/不一致编码。

restore

restore(delta, which)

给定一个由 Differ.compare() 或 ndiff() 生成的 delta，取出其中一个原始字符串（参数 which）的行，并去掉 Differ 的行前缀。示例：

>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
...              'ore\ntree\nemu\n'.splitlines(keepends=True))
>>> diff = list(diff)
>>> print(''.join(restore(diff, 1)), end="")
one
two
three
>>> print(''.join(restore(diff, 2)), end="")
ore
tree
emu

get_close_matches

get_close_matches(word, possibilities, n=3, cutoff=0.6)

返回一个最佳“足够好”匹配的列表。word 是一个需要查找紧密匹配的序列（通常是字符串），而 possibilities 是一个用于与 word 匹配的序列列表（通常是字符串列表）。

示例

一个diff工具：

""" Command line interface to difflib.py providing diffs in four formats:

* ndiff:    lists every line and highlights interline changes.
* context:  highlights clusters of changes in a before/after format.
* unified:  highlights clusters of changes in an inline format.
* html:     generates side by side comparison with change highlights.

"""

import sys, os, difflib, argparse
from datetime import datetime, timezone

def file_mtime(path):
    t = datetime.fromtimestamp(os.stat(path).st_mtime,
                               timezone.utc)
    return t.astimezone().isoformat()

def main():

    parser = argparse.ArgumentParser()
    parser.add_argument('-c', action='store_true', default=False,
                        help='Produce a context format diff (default)')
    parser.add_argument('-u', action='store_true', default=False,
                        help='Produce a unified format diff')
    parser.add_argument('-m', action='store_true', default=False,
                        help='Produce HTML side by side diff '
                             '(can use -c and -l in conjunction)')
    parser.add_argument('-n', action='store_true', default=False,
                        help='Produce a ndiff format diff')
    parser.add_argument('-d', action='store_true', default=False,
                        help='diff bytes (can use -c and -u in conjunction)')
    parser.add_argument('-l', '--lines', type=int, default=3,
                        help='Set number of context lines (default 3)')
    parser.add_argument('fromfile')
    parser.add_argument('tofile')
    options = parser.parse_args()

    n = options.lines
    fromfile = options.fromfile
    tofile = options.tofile

    fromdate = file_mtime(fromfile)
    todate = file_mtime(tofile)

    diff = None

    if options.d:
        with open(fromfile, 'rb') as ff:
            fromlines = ff.readlines()
        with open(tofile, 'rb') as tf:
            tolines = tf.readlines()
        fromfile_b = fromfile.encode()
        tofile_b = tofile.encode()
        fromdate_b = fromdate.encode()
        todate_b = todate.encode()

        dfunc = None
        if options.c:
            dfunc = difflib.context_diff
        elif options.u:
            dfunc = difflib.unified_diff
        else:
            print("Please specify either -c or -u with -d")
            sys.exit(1)
        diff = difflib.diff_bytes(dfunc, fromlines, tolines, fromfile_b, tofile_b, fromdate_b, todate_b, n=n)
        sys.stdout.buffer.writelines(diff)
    elif options.u:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
        sys.stdout.writelines(diff)
    elif options.n:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.ndiff(fromlines, tolines)
        sys.stdout.writelines(diff)
    elif options.m:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile, tofile, context=options.c, numlines=n)
        sys.stdout.write(diff)
    elif options.c:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.context_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
        sys.stdout.writelines(diff)

    if diff is None:
        print("Error: No valid diff generated.")
        sys.exit(1)

if __name__ == '__main__':
    main()

posted @ 2025-12-04 10:10 3的4次方阅读(27) 评论(0) 收藏举报

刷新页面返回顶部

3to4

Python difflib 比较序列差异

Python difflib 比较序列差异

类

SequenceMatcher

Differ

HtmlDiff

函数

context_diff

unified_diff

ndiff

diff_bytes

restore

get_close_matches

示例

公告