统计域名并排名
import re domain = {} file = open("list1") for row in file.readlines(): regex = re.compile(r'^http://.*\.(com|cn)') result = regex.match(row).group() if result in domain: domain[result] += 1 else: domain[result] = 1 for item in sorted(domain.items(), key=lambda x: x[1]): print(item[0], item[1])
结果:
http://a.domain.com/1.html
http://a.domain.com/2.html
http://b.domain.com/1.html
http://b.domain.com/2.html
http://b.domain.com/3.html
http://c.domain.com/4.html
http://b.domain.com/5.html
http://c.domain.com/5.html