爬虫5:单页面爬虫-珠海历史天气

  用了几个小时编写了一个爬取珠海历史天气的python爬虫,这里记录下来

  1 引入模块requests和bs4

import requests
from bs4 import BeautifulSoup

  2 目标url

url = 'http://lishi.tianqi.com/zhuhai/201512.html'

  3 定义头信息headers,伪装浏览器访问服务器

headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Host': 'lishi.tianqi.com',

'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0',
}

  4 获取目标相应reponse, 并设置编码 r.apparent_encoding自动解析爬取的编码格式,否则默认是gbk,而解析后的是GB2312;(在meta中查看目标网站编码)

r = requests.get(url, headers = headers)
r.encoding = r.apparent_encoding

  5 w+方式打开要保存的文件

fd = open('w201512.txt','w+')

  6 利用bs4包获取div id= tqtongji2标签下的 ul标签内容:

soup = BeautifulSoup(r.text, "html.parser")
res_div = soup.select("div.tqtongji2 > ul")

  7 循环获取ul标签下的li标签的内容,get_text()方法获取标签的内容,注意编码

for item in res_div:
    res_li = item.select("li")
    for item_li in res_li:
        item_li = item_li.get_text().encode('utf-8')
        fd.write(item_li)
        fd.write(',')
        print item_li
    fd.write('\n')

  8 关闭保存的文件

fd.close()

 

  源码:

import requests
from bs4 import BeautifulSoup

url = 'http://lishi.tianqi.com/zhuhai/201512.html'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Host': 'lishi.tianqi.com',

'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0',
}
r = requests.get(url, headers = headers)
r.encoding = r.apparent_encoding

fd = open('w201512.txt','w+')

soup = BeautifulSoup(r.text, "html.parser")
res_div = soup.select("div.tqtongji2 > ul")
for item in res_div:
    res_li = item.select("li")
    for item_li in res_li:
        item_li = item_li.get_text().encode('utf-8')
        fd.write(item_li)
        fd.write(',')
        print item_li
    fd.write('\n')

fd.close()

 

  结果:珠海一个月的天气的爬取结果展示:

 

posted @ 2016-08-04 15:17  rongyux  阅读(624)  评论(0编辑  收藏  举报