Python中的requests初试抓取网页内容

最近开始学习Python,主要用于数据抓取和数据分析。今天要写的功能是,使用Request抓取网页源码。

 1 import requests
 2 
 3 def getHtmlText(url):
 4     try:
 5         r = requests.get(url, timeout=30)
 6         r.raise_for_status()
 7         r.encoding=r.apparent_encoding
 8         print(r.headers)
 9         return r.text
10     except:
11         return "产生异常"
12 
13 if __name__ == "__main__":
14     url = "https://www.baidu.com"
15     print(getHtmlText(url))

 

检查程序性能,检查程序的运行时间:

 1 import requests
 2 import time as t
 3 
 4 def getHtmlText(url):
 5      try:
 6          r = requests.get(url)
 7          r.raise_for_status()
 8          r.encoding = r.apparent_encoding
 9          return r.text
10      except:
11          return "产生异常"
12 
13 def main():
14     url = 'http://www.baidu.com'
15     t0 = t.clock()
16     for i in range(100):
17         r = getHtmlText(url)
18     t1 = t.clock()
19     print('获取100次所需时间为%s秒'%(t1-t0))
20 
21  #if __name__ == '__main__':
22 main()

运行结果:

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/dahe/PycharmProjects/dahe/niubi100.py
获取100次所需时间为2.126808秒

Process finished with exit code 0

 

posted @ 2017-09-22 23:49  你好我是大河  阅读(369)  评论(0)    收藏  举报