python爬虫笔记(六)商品页面爬取关键词、图片爬取保存
1:爬取商品例子
import requests url = "https://item.jd.com/4461939.html" try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding #获得编码方式 print(r.text[:1000]) except: print("Failed")
2:爬取亚马孙商品例子(添加了headers)
import requests url = "https://www.amazon.cn/dp/B00BMK4GKW/ref=cngwdyfloorv2_recs_0?pf_rd_p=3aeea79d-b33f-46f8-8020-d2edee624402&pf_rd_s=desktop-2&pf_rd_t=36701&pf_rd_i=desktop&pf_rd_m=A1AJ19PSB66TGU&pf_rd_r=JJ1T3FF75EEVCFSPF5J0&pf_rd_r=JJ1T3FF75EEVCFSPF5J0&pf_rd_p=3aeea79d-b33f-46f8-8020-d2edee624402" kv = {'user-agent':'Mozilla/5.0'} try: r = requests.get(url, headers = kv) r.encoding = r.apparent_encoding print(r.text) except: print("Failed.")
3、关键词:搜索引擎关键词提交接口:
百度的关键词接口:
http://www.baidu.com/s?wd=keyword
360的关键词接口:
http://www.so.com/s?q=keyword
例子:
import requests kv = {'wd':'Python'} r=requests.get("http://www.baidu.com/s", params=kv)
4、网络图片爬取:
网络图片链接的格式:
http://www.example.com/picture.jpg
国家地理:
http://www.nationalgeographic.com.cn/
选择一个图片的地址
import requests url = "http://image.nationalgeographic.com.cn/2015/0121/20150121033625957.jpg" r = requests.get(url) path = url.split('/')[-1] try: with open(path,'wb') as f: f.write(r.content) f.close except: print('Failed.')
完整版
import requests import os url = "http://image.nationalgeographic.com.cn/2015/0121/20150121033625957.jpg" r = requests.get(url) root = '' path = root + url.split('/')[-1] try: if not os.path.exists(root): os.mkdir(root) if not os.path.exists(path): r = requests.get(url) with open(path,'wb') as f: f.write(r.content) f.close else: print('file exist') except: print('Failed.')
5 、查询IP地址的归属:
可以使用 www.ip138.com
import requests url = "http://m.ip138.com/ip.asp?ip=" r = requests.get(url + '202.204.80.112') print(r.text[-500:])

浙公网安备 33010602011771号