2016/5/9 14:17:40 python bs4 BeautifulSoup 抓取糗事百科文字信息

2016/5/9 14:17:40
  1. # -*- coding: utf-8 -*-
  2. import urllib
  3. import urllib.request
  4. from bs4 import BeautifulSoup
  5. class QiuShi():
  6. def __init__(self):
  7. user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
  8. self.headers = {'User-Agent':user_agent}
  9. def query(self,page=1):
  10. self.url = 'http://www.qiushibaike.com/text/page/' + str(page)
  11. print(self.url)
  12. res = urllib.request.Request(self.url,headers=self.headers)
  13. html = urllib.request.urlopen(res)
  14. bsoup = BeautifulSoup(html,'html.parser')
  15. for content in bsoup.find_all('div',{'class':'content'}):
  16. print(content.get_text())
  17. if __name__ =='__main__':
  18. qiushi = QiuShi()
  19. for i in range(35):
  20. qiushi.query(i)





posted @ 2016-09-06 10:43  乾坤颠倒  阅读(180)  评论(0编辑  收藏  举报