Python difflib 比较序列差异

Python difflib 比较序列差异

difflib — 用于计算差异的辅助工具 — Python 3.14.0 文档 - Python 文档

此模块提供用于比较序列的类和函数。它可被用于比较文件,并能以多种格式生成文件差异信息,包括 HTML、上下文(context)和统一(unified)差异格式。要比较目录和文件,另请参阅 filecmp 模块。

SequenceMatcher

用于比较任意类型的序列对,只要序列中的元素是可哈希的

Differ

用于比较文本行序列并生成人类可读的差异或增量的类,内部使用 SequenceMatcher 来比较。

Differ 差异结果的每一行都以一个双字母代码开头:

代码 含义
- 序列1独有的行
+ 序列2独有的行
两个空格 两个序列共有的行
? 两个序列都不存在的行(这种行是用于放置如上箭头 ^ ,下划线 ~~~ 这种提示你这块有差异的提示符)

HtmlDiff

用来创建一个 HTML 表格(或包含该表格的完整 HTML 文件),以并排、逐行的方式比较文本,并高亮显示行间和行内的变化。该表格可以以完整模式或上下文差异模式生成。

make_file(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5, *, charset='utf-8')

比较 fromlinestolines(字符串列表),并返回一个字符串,该字符串是一个完整的 HTML 文件,其中包含一个表格,逐行显示差异,并高亮显示行间和行内的变化。fromdesctodesc 是可选的关键字参数,用于指定 from/to 文件列标题字符串(两者都默认为空字符串)。contextnumlines 都是可选的关键字参数。当需要显示上下文差异时,将 context 设置为 True,否则默认为 False 以显示完整文件。numlines 默认为 5。当 contextTrue 时,numlines 控制差异高亮周围的上下文行数。当 contextFalse 时,在使用“下一个”超链接时,numlines 控制差异高亮前显示的行数(设置为零将导致“下一个”超链接将下一个差异高亮放置在浏览器顶部,没有任何前导上下文)。

产生的示例输出如下:

image

函数

context_diff

context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')

比较 ab(字符串列表),以上下文差异格式返回一个增量生成器。上下文差异格式的示例输出:

*** .\a.txt     2025-12-04T09:22:47.152087+08:00
--- .\b.txt     2025-12-04T09:23:00.877708+08:00
***************
*** 1,4 **** (这里是*打头,说明是a.txt的内容)
! * ndiff:    lists every line and highlights interline changes.
! * context:  highlights clusters of changes in a before/after format.
! * unified:  highlights clusters of changes in an inline format.
  * html:     generates side by side comparison with change highlights.
--- 1,3 ---- (这里是-打头,说明是b.txt的内容)
! * ngfdgdiff:    lisfga every lfdsfdsfdsfdsainterline changes.
  * html:     generates side by side comparison with change highlights.
+ * context:  higsadters of changes fdsfore/after format.

unified_diff

unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')

比较 ab(字符串列表),以统一差异格式返回一个增量生成器。统一差异格式的示例输出:

--- .\a.txt     2025-12-04T09:22:47.152087+08:00
+++ .\b.txt     2025-12-04T09:23:00.877708+08:00
@@ -1,4 +1,3 @@
-* ndiff:    lists every line and highlights interline changes.
-* context:  highlights clusters of changes in a before/after format.
-* unified:  highlights clusters of changes in an inline format.
+* ngfdgdiff:    lisfga every lfdsfdsfdsfdsainterline changes.
 * html:     generates side by side comparison with change highlights.
+* context:  higsadters of changes fdsfore/after format.

ndiff

ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)

比较 ab(字符串列表),返回一个 Differ 样式的增量生成器。示例输出(显然这里是 Differ 样式):

+ * ngfdgdiff:    lisfga every lfdsfdsfdsfdsainterline changes.
- * ndiff:    lists every line and highlights interline changes.
- * context:  highlights clusters of changes in a before/after format.
- * unified:  highlights clusters of changes in an inline format.
  * html:     generates side by side comparison with change highlights.
+ * context:  higsadters of changes fdsfore/after format.

diff_bytes

diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n')

使用 dfunc 比较 ab(字节对象列表);以 dfunc 返回的格式生成一个差异行序列(也是字节)。dfunc 必须是一个可调用对象,通常是 unified_diff()context_diff()

允许您比较具有未知或不一致编码的数据。除 n 外的所有输入都必须是字节对象,而不是 str。它的工作原理是将所有输入(除 n 外)无损地转换为 str,然后调用 dfunc(a, b, fromfile, tofile, fromfiledate, tofiledate, n, lineterm)。然后将 dfunc 的输出转换回字节,因此您收到的差异行与 ab 具有相同的未知/不一致编码。

restore

restore(delta, which)

给定一个由 Differ.compare()ndiff() 生成的 delta,取出其中一个原始字符串(参数 which)的行,并去掉 Differ 的行前缀。示例:

>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
...              'ore\ntree\nemu\n'.splitlines(keepends=True))
>>> diff = list(diff)
>>> print(''.join(restore(diff, 1)), end="")
one
two
three
>>> print(''.join(restore(diff, 2)), end="")
ore
tree
emu

get_close_matches

get_close_matches(word, possibilities, n=3, cutoff=0.6)

返回一个最佳“足够好”匹配的列表。word 是一个需要查找紧密匹配的序列(通常是字符串),而 possibilities 是一个用于与 word 匹配的序列列表(通常是字符串列表)。

示例

一个diff工具:

""" Command line interface to difflib.py providing diffs in four formats:

* ndiff:    lists every line and highlights interline changes.
* context:  highlights clusters of changes in a before/after format.
* unified:  highlights clusters of changes in an inline format.
* html:     generates side by side comparison with change highlights.

"""

import sys, os, difflib, argparse
from datetime import datetime, timezone

def file_mtime(path):
    t = datetime.fromtimestamp(os.stat(path).st_mtime,
                               timezone.utc)
    return t.astimezone().isoformat()

def main():

    parser = argparse.ArgumentParser()
    parser.add_argument('-c', action='store_true', default=False,
                        help='Produce a context format diff (default)')
    parser.add_argument('-u', action='store_true', default=False,
                        help='Produce a unified format diff')
    parser.add_argument('-m', action='store_true', default=False,
                        help='Produce HTML side by side diff '
                             '(can use -c and -l in conjunction)')
    parser.add_argument('-n', action='store_true', default=False,
                        help='Produce a ndiff format diff')
    parser.add_argument('-d', action='store_true', default=False,
                        help='diff bytes (can use -c and -u in conjunction)')
    parser.add_argument('-l', '--lines', type=int, default=3,
                        help='Set number of context lines (default 3)')
    parser.add_argument('fromfile')
    parser.add_argument('tofile')
    options = parser.parse_args()

    n = options.lines
    fromfile = options.fromfile
    tofile = options.tofile

    fromdate = file_mtime(fromfile)
    todate = file_mtime(tofile)

    diff = None

    if options.d:
        with open(fromfile, 'rb') as ff:
            fromlines = ff.readlines()
        with open(tofile, 'rb') as tf:
            tolines = tf.readlines()
        fromfile_b = fromfile.encode()
        tofile_b = tofile.encode()
        fromdate_b = fromdate.encode()
        todate_b = todate.encode()

        dfunc = None
        if options.c:
            dfunc = difflib.context_diff
        elif options.u:
            dfunc = difflib.unified_diff
        else:
            print("Please specify either -c or -u with -d")
            sys.exit(1)
        diff = difflib.diff_bytes(dfunc, fromlines, tolines, fromfile_b, tofile_b, fromdate_b, todate_b, n=n)
        sys.stdout.buffer.writelines(diff)
    elif options.u:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
        sys.stdout.writelines(diff)
    elif options.n:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.ndiff(fromlines, tolines)
        sys.stdout.writelines(diff)
    elif options.m:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile, tofile, context=options.c, numlines=n)
        sys.stdout.write(diff)
    elif options.c:
        with open(fromfile) as ff:
            fromlines = ff.readlines()
        with open(tofile) as tf:
            tolines = tf.readlines()
        diff = difflib.context_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
        sys.stdout.writelines(diff)

    if diff is None:
        print("Error: No valid diff generated.")
        sys.exit(1)

if __name__ == '__main__':
    main()
posted @ 2025-12-04 10:10  3的4次方  阅读(7)  评论(0)    收藏  举报