Python中的requests初试抓取网页内容
最近开始学习Python,主要用于数据抓取和数据分析。今天要写的功能是,使用Request抓取网页源码。
1 import requests 2 3 def getHtmlText(url): 4 try: 5 r = requests.get(url, timeout=30) 6 r.raise_for_status() 7 r.encoding=r.apparent_encoding 8 print(r.headers) 9 return r.text 10 except: 11 return "产生异常" 12 13 if __name__ == "__main__": 14 url = "https://www.baidu.com" 15 print(getHtmlText(url))
检查程序性能,检查程序的运行时间:
1 import requests 2 import time as t 3 4 def getHtmlText(url): 5 try: 6 r = requests.get(url) 7 r.raise_for_status() 8 r.encoding = r.apparent_encoding 9 return r.text 10 except: 11 return "产生异常" 12 13 def main(): 14 url = 'http://www.baidu.com' 15 t0 = t.clock() 16 for i in range(100): 17 r = getHtmlText(url) 18 t1 = t.clock() 19 print('获取100次所需时间为%s秒'%(t1-t0)) 20 21 #if __name__ == '__main__': 22 main()
运行结果:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/dahe/PycharmProjects/dahe/niubi100.py
获取100次所需时间为2.126808秒
Process finished with exit code 0

浙公网安备 33010602011771号