python爬虫笔记(六)商品页面爬取关键词、图片爬取保存

 

1:爬取商品例子

import requests
url = "https://item.jd.com/4461939.html"
try:
     r = requests.get(url)
     r.raise_for_status()
     r.encoding = r.apparent_encoding #获得编码方式
     print(r.text[:1000])
except:
     print("Failed")

 

2:爬取亚马孙商品例子(添加了headers)

import requests
url = "https://www.amazon.cn/dp/B00BMK4GKW/ref=cngwdyfloorv2_recs_0?pf_rd_p=3aeea79d-b33f-46f8-8020-d2edee624402&pf_rd_s=desktop-2&pf_rd_t=36701&pf_rd_i=desktop&pf_rd_m=A1AJ19PSB66TGU&pf_rd_r=JJ1T3FF75EEVCFSPF5J0&pf_rd_r=JJ1T3FF75EEVCFSPF5J0&pf_rd_p=3aeea79d-b33f-46f8-8020-d2edee624402"
kv = {'user-agent':'Mozilla/5.0'}

try:
     r = requests.get(url, headers = kv)
     r.encoding = r.apparent_encoding
     print(r.text)

except:
    print("Failed.")  

 

3、关键词:搜索引擎关键词提交接口:

      百度的关键词接口:

            http://www.baidu.com/s?wd=keyword

      360的关键词接口

            http://www.so.com/s?q=keyword

例子:

import requests
kv = {'wd':'Python'}
r=requests.get("http://www.baidu.com/s", params=kv)

 

4、网络图片爬取

     网络图片链接的格式:

         http://www.example.com/picture.jpg

      国家地理:

          http://www.nationalgeographic.com.cn/

          选择一个图片的地址

import requests
url = "http://image.nationalgeographic.com.cn/2015/0121/20150121033625957.jpg"
r = requests.get(url)
path = url.split('/')[-1]
try:   
    with open(path,'wb') as f:
            f.write(r.content)
            f.close
except:
    print('Failed.') 

完整版

import requests
import os
url = "http://image.nationalgeographic.com.cn/2015/0121/20150121033625957.jpg"
r = requests.get(url)
root = ''
path = root + url.split('/')[-1]
try:
     if not os.path.exists(root):   
          os.mkdir(root)
     if not os.path.exists(path):
          r = requests.get(url)   
          with open(path,'wb') as f:
                f.write(r.content)
                f.close
     else:
         print('file exist')

except:
    print('Failed.') 

 

5 、查询IP地址的归属:

       可以使用 www.ip138.com

import requests
url = "http://m.ip138.com/ip.asp?ip="
r = requests.get(url + '202.204.80.112')
print(r.text[-500:])

 

posted @ 2017-12-14 11:07  抽象Java  阅读(289)  评论(0)    收藏  举报