Python difflib 比较序列差异
Python difflib 比较序列差异
此模块提供用于比较序列的类和函数。它可被用于比较文件,并能以多种格式生成文件差异信息,包括 HTML、上下文(context)和统一(unified)差异格式。要比较目录和文件,另请参阅 filecmp 模块。
类
SequenceMatcher
用于比较任意类型的序列对,只要序列中的元素是可哈希的。
Differ
用于比较文本行序列并生成人类可读的差异或增量的类,内部使用 SequenceMatcher 来比较。
Differ 差异结果的每一行都以一个双字母代码开头:
| 代码 | 含义 |
|---|---|
- |
序列1独有的行 |
+ |
序列2独有的行 |
两个空格 |
两个序列共有的行 |
? |
两个序列都不存在的行(这种行是用于放置如上箭头 ^ ,下划线 ~~~ 这种提示你这块有差异的提示符) |
HtmlDiff
用来创建一个 HTML 表格(或包含该表格的完整 HTML 文件),以并排、逐行的方式比较文本,并高亮显示行间和行内的变化。该表格可以以完整模式或上下文差异模式生成。
make_file(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5, *, charset='utf-8')
比较 fromlines 和 tolines(字符串列表),并返回一个字符串,该字符串是一个完整的 HTML 文件,其中包含一个表格,逐行显示差异,并高亮显示行间和行内的变化。fromdesc 和 todesc 是可选的关键字参数,用于指定 from/to 文件列标题字符串(两者都默认为空字符串)。context 和 numlines 都是可选的关键字参数。当需要显示上下文差异时,将 context 设置为 True,否则默认为 False 以显示完整文件。numlines 默认为 5。当 context 为 True 时,numlines 控制差异高亮周围的上下文行数。当 context 为 False 时,在使用“下一个”超链接时,numlines 控制差异高亮前显示的行数(设置为零将导致“下一个”超链接将下一个差异高亮放置在浏览器顶部,没有任何前导上下文)。
产生的示例输出如下:

函数
context_diff
context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
比较 a 和 b(字符串列表),以上下文差异格式返回一个增量生成器。上下文差异格式的示例输出:
*** .\a.txt 2025-12-04T09:22:47.152087+08:00
--- .\b.txt 2025-12-04T09:23:00.877708+08:00
***************
*** 1,4 **** (这里是*打头,说明是a.txt的内容)
! * ndiff: lists every line and highlights interline changes.
! * context: highlights clusters of changes in a before/after format.
! * unified: highlights clusters of changes in an inline format.
* html: generates side by side comparison with change highlights.
--- 1,3 ---- (这里是-打头,说明是b.txt的内容)
! * ngfdgdiff: lisfga every lfdsfdsfdsfdsainterline changes.
* html: generates side by side comparison with change highlights.
+ * context: higsadters of changes fdsfore/after format.
unified_diff
unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
比较 a 和 b(字符串列表),以统一差异格式返回一个增量生成器。统一差异格式的示例输出:
--- .\a.txt 2025-12-04T09:22:47.152087+08:00
+++ .\b.txt 2025-12-04T09:23:00.877708+08:00
@@ -1,4 +1,3 @@
-* ndiff: lists every line and highlights interline changes.
-* context: highlights clusters of changes in a before/after format.
-* unified: highlights clusters of changes in an inline format.
+* ngfdgdiff: lisfga every lfdsfdsfdsfdsainterline changes.
* html: generates side by side comparison with change highlights.
+* context: higsadters of changes fdsfore/after format.
ndiff
ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
比较 a 和 b(字符串列表),返回一个 Differ 样式的增量生成器。示例输出(显然这里是 Differ 样式):
+ * ngfdgdiff: lisfga every lfdsfdsfdsfdsainterline changes.
- * ndiff: lists every line and highlights interline changes.
- * context: highlights clusters of changes in a before/after format.
- * unified: highlights clusters of changes in an inline format.
* html: generates side by side comparison with change highlights.
+ * context: higsadters of changes fdsfore/after format.
diff_bytes
diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n')
使用 dfunc 比较 a 和 b(字节对象列表);以 dfunc 返回的格式生成一个差异行序列(也是字节)。dfunc 必须是一个可调用对象,通常是 unified_diff() 或 context_diff()。
允许您比较具有未知或不一致编码的数据。除 n 外的所有输入都必须是字节对象,而不是 str。它的工作原理是将所有输入(除 n 外)无损地转换为 str,然后调用 dfunc(a, b, fromfile, tofile, fromfiledate, tofiledate, n, lineterm)。然后将 dfunc 的输出转换回字节,因此您收到的差异行与 a 和 b 具有相同的未知/不一致编码。
restore
restore(delta, which)
给定一个由 Differ.compare() 或 ndiff() 生成的 delta,取出其中一个原始字符串(参数 which)的行,并去掉 Differ 的行前缀。示例:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
... 'ore\ntree\nemu\n'.splitlines(keepends=True))
>>> diff = list(diff)
>>> print(''.join(restore(diff, 1)), end="")
one
two
three
>>> print(''.join(restore(diff, 2)), end="")
ore
tree
emu
get_close_matches
get_close_matches(word, possibilities, n=3, cutoff=0.6)
返回一个最佳“足够好”匹配的列表。word 是一个需要查找紧密匹配的序列(通常是字符串),而 possibilities 是一个用于与 word 匹配的序列列表(通常是字符串列表)。
示例
一个diff工具:
""" Command line interface to difflib.py providing diffs in four formats:
* ndiff: lists every line and highlights interline changes.
* context: highlights clusters of changes in a before/after format.
* unified: highlights clusters of changes in an inline format.
* html: generates side by side comparison with change highlights.
"""
import sys, os, difflib, argparse
from datetime import datetime, timezone
def file_mtime(path):
t = datetime.fromtimestamp(os.stat(path).st_mtime,
timezone.utc)
return t.astimezone().isoformat()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-c', action='store_true', default=False,
help='Produce a context format diff (default)')
parser.add_argument('-u', action='store_true', default=False,
help='Produce a unified format diff')
parser.add_argument('-m', action='store_true', default=False,
help='Produce HTML side by side diff '
'(can use -c and -l in conjunction)')
parser.add_argument('-n', action='store_true', default=False,
help='Produce a ndiff format diff')
parser.add_argument('-d', action='store_true', default=False,
help='diff bytes (can use -c and -u in conjunction)')
parser.add_argument('-l', '--lines', type=int, default=3,
help='Set number of context lines (default 3)')
parser.add_argument('fromfile')
parser.add_argument('tofile')
options = parser.parse_args()
n = options.lines
fromfile = options.fromfile
tofile = options.tofile
fromdate = file_mtime(fromfile)
todate = file_mtime(tofile)
diff = None
if options.d:
with open(fromfile, 'rb') as ff:
fromlines = ff.readlines()
with open(tofile, 'rb') as tf:
tolines = tf.readlines()
fromfile_b = fromfile.encode()
tofile_b = tofile.encode()
fromdate_b = fromdate.encode()
todate_b = todate.encode()
dfunc = None
if options.c:
dfunc = difflib.context_diff
elif options.u:
dfunc = difflib.unified_diff
else:
print("Please specify either -c or -u with -d")
sys.exit(1)
diff = difflib.diff_bytes(dfunc, fromlines, tolines, fromfile_b, tofile_b, fromdate_b, todate_b, n=n)
sys.stdout.buffer.writelines(diff)
elif options.u:
with open(fromfile) as ff:
fromlines = ff.readlines()
with open(tofile) as tf:
tolines = tf.readlines()
diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
sys.stdout.writelines(diff)
elif options.n:
with open(fromfile) as ff:
fromlines = ff.readlines()
with open(tofile) as tf:
tolines = tf.readlines()
diff = difflib.ndiff(fromlines, tolines)
sys.stdout.writelines(diff)
elif options.m:
with open(fromfile) as ff:
fromlines = ff.readlines()
with open(tofile) as tf:
tolines = tf.readlines()
diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile, tofile, context=options.c, numlines=n)
sys.stdout.write(diff)
elif options.c:
with open(fromfile) as ff:
fromlines = ff.readlines()
with open(tofile) as tf:
tolines = tf.readlines()
diff = difflib.context_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
sys.stdout.writelines(diff)
if diff is None:
print("Error: No valid diff generated.")
sys.exit(1)
if __name__ == '__main__':
main()

浙公网安备 33010602011771号