Python之爬虫第一章（requests）

requests模块

什么是requests模块
- requests模块是python中原生的基于网络请求的模块，其主要作用是用来模拟浏览器发起请求。功能强大，用法简洁高效。在爬虫领域中占据着半壁江山的地位。
为什么要使用requests模块
- 因为在使用urllib模块的时候，会有诸多不便之处，总结如下：
  - 手动处理url编码
  - 手动处理post请求参数
  - 处理cookie和代理操作繁琐
  - ......
- 使用requests模块：
  - 自动处理url编码
  - 自动处理post请求参数
  - 简化cookie和代理操作
  - ......
如何使用requests模块
- 安装：
  - pip install requests
- 使用流程
  - 指定url
  - 基于requests模块发起请求
  - 获取响应对象中的数据值
  - 持久化存储

获取sogou网页

import requests
# 指定url
url = 'https://www.sogou.com/'
# 发起请求
response = requests.get(url=url)
# 获取页面数据
# page_test = response.text   # 返回的字符串
# response.content   # 返回bytes类型
# response.headers # 返回请求头信息
# response.json()   # 响应中有json数据才可以调用json方法
# response.url  # 返回请求的url
# response.status_code # 返回请求状态
page_test = response.text
print(page_test)

自定义头部信息

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    }
response = requests.get(url=url, headers=headers)

URL传递参数（将参数封装到字典中通过params）

url = 'https://www.baidu.com/s?'
param = {'wd': '美女'}

response = requests.get(url=url, params=param)

修改网页编码

#获取/修改网页编码
response.encoding('utf-8')

案例

需求：爬取搜狗指定词条搜索后的页面数据

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 爬取搜狗指定词条搜索后的页面数据
import requests

seek = input("请输入:")
url = 'https://www.sogou.com/web?'
# https://www.sogou.com/web?query=%E5%8C%97%E4%BA%AC%E6%97%B6%E9%97%B4_
# 将请求携带的参数封装到字典中
param = {
    'query': seek
}

# UA 伪装
# 自定义请求头信息
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    }
response = requests.get(url=url, params=param, headers=headers)
print(response.text)
with open('tes.html','w',encoding='utf-8') as f1:
    f1.write(response.text)
    f1.close()

post

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests

# POST请求
content = input("请输入：")
url = 'https://fanyi.baidu.com/sug'
headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    }
data = {
    'kw' : content
}
response = requests.post(url=url,data=data, headers=headers)
# 返回的是json数据
print(response.json())

requests没有什么好写的，如果想了解更多查百度

posted @ 2020-02-07 16:44 杨灏阅读(20) 评论(0) 收藏举报

刷新页面返回顶部

进步、

志在峰巅的攀登者，不会陶醉在沿途的某个脚印之中。

Python之爬虫第一章（requests）

requests模块

公告