Python爬取亚马逊商品页面

仍然利用Requests库来实现

1 import requests
2 r=requests.get('https://www.amazon.cn/gp/product/B01M8L5Z3Y')
3 r.status_code
4 r.encoding=r.apparent_encoding
5 r.text

发现结果有错误

'<!--\n        To discuss automated access to Amazon data please contact api-services-support@amazon.com.\n        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com.cn/index.html/ref=rm_5_sv, or our Product Advertising API at https://associates.amazon.cn/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.\n-->\n<html>\n   <head>\n      <meta http-equiv="Content-Type" content="text/html;charset=utf-8">\n      <title>亚马逊</title>\n   <body style="text-align:center;">\n      <br>\n      <div style="width:600px;margin:0 auto;text-align:left;">\n         <h2>意外错误</h2>\n      </div>\n      <br>\n      <div style="width:500px;margin:0 auto;text-align:left;"><font color="red">抱歉,由于程序执行时,遇到意外错误,您刚刚操作没有执行成功,请稍后重试。或将此错误报告给我们的客服中心:<a href="mailto:service_bj@cs.amazon.cn">service_bj@cs.amazon.cn</a></font><br><br>推荐您<a href="javascript:history.back(1)">返回上一页</a>,确认您的操作无误后,再继续其他操作。<br>您可以通过亚马逊<a href="https://www.amazon.cn/help/ref=cs_503_link/" target="_blank" rel="noopener noreferrer">帮助中心</a>,获得更多的帮助。<br></div>\n   </body>\n</html>'

于是我们查看head

 r.request.headers
 得到{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}

发现

'User-Agent': 'python-requests/2.26.0'
被检测为机器人故无法进行网页爬取所以我们对其进行更改
url='https://www.amazon.cn/gp/product/B01M8L5Z3Y'
try:
    kv={'user-agent':'Mozilla/5.0'}
    r=requests.get(url,headers=kv)
    r.raise_for_status()
    r.encoding=r.apparent_encoding
    print(r.text[1000:2000])
except:
    print('爬取失败')
    

-----------------------------------------------------------------

发现百度关键词搜索接口:

https://www.baidu.com/s?wd=    #观察可得

import requests
kv={'wd':'python'}
try:
    r=requests.get('https://www.baidu.com/',params=kv)
    r.raise_for_status()
    print(r.request.url)
    print(len(r.text))
except:
    print('爬取失败')

 



posted @ 2023-02-24 17:39  小小派下士  阅读(109)  评论(0编辑  收藏  举报