• request模块是什么

request模块是python中原生的基于网络请求的模块,主要作用是用来模拟浏览器发起请求,在爬虫中应用广泛

  • 为什么使用request模块

request模块可以自动处理url编码,自动处理post请求,简化cookie和代理操作等,这些操作都是urllib模块实现不了的,所以使用request模块更方便些

  • 如何使用request模块
    • 安装  
pip install request
    • 使用流程
      • 指定url
      • 基于request模块发起请求
      • 获取响应对象中的数据值
      • 持久化存储
  • request模块使用实例

基于requests模块的get请求,爬取搜狗首页内容

import requests
# 指定url
url = "https://www.sogou.com/"
# 获取响应对象中的数据值
respons = requests.get(url=url).text
# 持久化存储
with open("sougoushouye.html","w",encoding="utf-8") as f:
    f.write(respons)

基于requests模块的get请求,爬取搜狗指定词条搜索后的页面数据

import requests

url = "https://www.sogou.com/web?query=%E5%93%88%E5%93%88"

#伪装请求头
headers = {
"User-Agent": " Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}
query = input("请输入要查询的内容:")

#携带动态参数
prams = {
"query": query
}

respons = requests.get(url=url,params=prams).text
with open("sougouzhidingcitiao.html","w",encoding="utf-8") as f1:
    f1.write(respons)

 

爬取百度翻译结果

import requests

url = "https://fanyi.baidu.com/sug"
kw = input("word :")

data = {
    "kw":kw
}

response = requests.post(url=url,data=data)
print(response.json())

#response.text : 字符串
#.content : 二进制
#.json() : 对象

爬取豆瓣电影分类排行榜 https://movie.douban.com/中的电影详情数据

import requests

url = 'https://movie.douban.com/j/chart/top_list'
param = {
    "type": "5",
    "interval_id": "100:90",
    "action": '',
    "start": "60",
    "limit": "100",
    }
movie_data = requests.get(url=url,params=param).json()

print(movie_data)

 

爬取国家药品监督管理总局中基于中华人民共和国化妆品生产许可证相关数据

import requests

first_url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList"
headers = {
    "User-Agent":" Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}
id_list = []
for page in range(1,4):
    first_data = {
        "on": "true",
        "page": "1",
        "pageSize": "15",
        "productName":"",
        "conditionType":"1",
        "applyname":"",
        "applysn":""
    }
    response = requests.post(url=first_url,data=first_data)
    res = response.json()
    res_list = res["list"]
    for msg in res_list:
        id_list.append(msg["ID"])

next_url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById"
for ID in id_list:
    data = {
        "id":ID
    }
    response1 = requests.post(url=next_url,data=data)

    print(response1.json())