python difflib比较内容之间的差异
之前一直在寻找比较内容差异的库,原来python标准库里自带有difflib库
这就比较有意思了,对于数据采集来说比较两次请求参数的变化就很有用了,可以知道哪些是变化的,方便定位比较
import difflib
def diff_headers():
text1 ='''Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: q=0.9,en;q=0.8,en-US;q=0.7;zh-CN,zh;
Cache-Control: no-cache
Connection: keep-alive
Cookie: UM_distinctid=17c5f7e8e37f8b-030342123ea219-513c1743-15f900-17c2f7e8e38463; CNZZDATA1586682=cnzz_eid%3D1569740215-1636510718-null%26ntime%3D1642568049; PHPSESSID=l5otho4quql6jpf7majg5795fs; _stat_uid=05967439303530977045856681345587735
Host: www.chem365.net
Pragma: no-cache
Referer: http://www.chem365.net/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36 Edg/98.0.1108.50'''.splitlines(keepends=True)
text2 = ''' Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-US;q=0.7
Cache-Control: no-cache
Connection: keep-alive
Cookie: UM_distinctid=17c2f7e8e37f8b-030342123ea219-513c1743-15f900-17c2f7e8e38463; CNZZDATA1586682=cnzz_eid%3D1569740215-1636510718-null%26ntime%3D1642568049; PHPSESSID=l5otho4quql6jpf7majg5795fs; _stat_uid=05967439303530977045856681345587735
Host: www.chem365.net
Pragma: no-cache
Referer: http://www.chem365.net/web/index/information/classid/142.html
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36 Edg/98.0.1108.50
'''.splitlines(keepends=True)
d = difflib.HtmlDiff()
htmlContent = d.make_file(text1,text2)
# print(htmlContent)
with open('diff_header.html','w') as f:
f.write(htmlContent)
if __name__ == '__main__':
# diff_html()
diff_headers()

如图是根据生成的html可以清晰的看到内容的变动(不同的颜色代表不同的动作),这样做比较久很容易看出来了
更详细的内容可以参考: https://blog.csdn.net/weixin_45775963/article/details/104122753

浙公网安备 33010602011771号