爬取网站内容

大家自学requests库,官方文档如下(http://cn.python-requests.org/zh_CN/latest/),然后使用其中的方法,将如下两个网址内容爬取回来,使用python文件读写的内容,写入到名为filetest.txt的文本文档中;注意在爬取网址内容的方法写出函数,方便重复调用;

 

网址一:http://mirrors.aliyun.com/centos/7/isos/x86_64/0_README.txt

网址二:http://mirrors.aliyun.com/centos/7/isos/x86_64/sha256sum.txt

 

多线程版:
import
threading import requests #导入requests库 urls = ['http://mirrors.aliyun.com/centos/7/isos/x86_64/0_README.txt', 'http://mirrors.aliyun.com/centos/7/isos/x86_64/sha256sum.txt'] text_list = [] def get_text(url): r = requests.get(url) text_list.append(r.text) for url in urls: #遍历列表创建子线程 t1 = threading.Thread(target=get_text,args=(url,)) t1.start() #启动子线程 t1.join() #设置主线程在子线程结束后再执行 with open('filetext.txt','a+') as a: for one in text_list: a.write(one)


函数版:
import requests
urls = ['http://mirrors.aliyun.com/centos/7/isos/x86_64/0_README.txt',
'http://mirrors.aliyun.com/centos/7/isos/x86_64/sha256sum.txt']
def get_text(url):
r = requests.get(url)
return r.text
text_list = []
for url in urls:
text = get_text(url)
text_list.append(text)
with open('filetext.txt','a+') as a:
for one in text_list:
a.write(one)
 

 

posted @ 2020-06-29 11:25  时倾lzl  阅读(232)  评论(0)    收藏  举报