统计域名并排名
import re
domain = {}
file = open("list1")
for row in file.readlines():
regex = re.compile(r'^http://.*\.(com|cn)')
result = regex.match(row).group()
if result in domain:
domain[result] += 1
else:
domain[result] = 1
for item in sorted(domain.items(), key=lambda x: x[1]):
print(item[0], item[1])
结果:

http://a.domain.com/1.html
http://a.domain.com/2.html
http://b.domain.com/1.html
http://b.domain.com/2.html
http://b.domain.com/3.html
http://c.domain.com/4.html
http://b.domain.com/5.html
http://c.domain.com/5.html

浙公网安备 33010602011771号