python 统计url数量并排序

给定一个文件,统计url出现的数量,并排序

文件如下url.text

http://www.baidu.com/1.html
http://www.aqiyi.com/2.html
http://www.cssd.com/1.html
http://www.baidu.com/1.html
http://www.baidu.com/1.html
http://www.baidu.com/1.html
http://www.cssd.com/1.html
asdsasad
asdasdasdasdasd

思路:根据正则匹配出url,然后使用Counter模块进行计数,最后用sorted进行排序

# @Time    : 19-5-7 16:46
# @Author  : xueminchao
from  urllib.parse  import urlparse
from collections import Counter
import re

def read_file(file_name):
    url_list = []
    with open(file_name) as f:
        file_list = f.readlines()
        for line in file_list:
             if re.match(r'^https?:/{2}\w.+$', line):
                 url = urlparse(line)
                 url_list.append(url.netloc)
                 #print(url_list)
             else:
                pass
        result = Counter(url_list)
        print(result.items())
        d = sorted(result.items(), key=lambda x: x[1], reverse=False)
        print(d)

if __name__ == '__main__':
    file_name = 'url.text'
    read_file(file_name)

 

posted @ 2019-05-07 17:32  xmc_2022  阅读(568)  评论(0)    收藏  举报