python 统计url数量并排序
给定一个文件,统计url出现的数量,并排序
文件如下url.text
http://www.baidu.com/1.html http://www.aqiyi.com/2.html http://www.cssd.com/1.html http://www.baidu.com/1.html http://www.baidu.com/1.html http://www.baidu.com/1.html http://www.cssd.com/1.html asdsasad asdasdasdasdasd
思路:根据正则匹配出url,然后使用Counter模块进行计数,最后用sorted进行排序
# @Time : 19-5-7 16:46 # @Author : xueminchao from urllib.parse import urlparse from collections import Counter import re def read_file(file_name): url_list = [] with open(file_name) as f: file_list = f.readlines() for line in file_list: if re.match(r'^https?:/{2}\w.+$', line): url = urlparse(line) url_list.append(url.netloc) #print(url_list) else: pass result = Counter(url_list) print(result.items()) d = sorted(result.items(), key=lambda x: x[1], reverse=False) print(d) if __name__ == '__main__': file_name = 'url.text' read_file(file_name)

浙公网安备 33010602011771号