Python之爬虫相关requests

Python之爬虫相关

from: http://blog.csdn.net/shanzhizi/article/details/50903748

1、Python-第三方库requests详解

Requests 是用Python语言编写，基于 urllib，采用 Apache2 Licensed 开源协议的 HTTP 库。它比 urllib 更加方便，可以节约我们大量的工作，完全满足 HTTP 测试需求。Requests 的哲学是以 PEP 20 的习语为中心开发的，所以它比 urllib 更加 Pythoner。更重要的一点是它支持 Python3 哦！

Beautiful is better than ugly.(美丽优于丑陋)
Explicit is better than implicit.(清楚优于含糊)
Simple is better than complex.(简单优于复杂)
Complex is better than complicated.(复杂优于繁琐)
Readability counts.(重要的是可读性)

一、安装 Requests

pip install requests

或者，下载后安装：

$ git clone git://github.com/kennethreitz/requests.git
$ cd requests
$ python setup.py install

二、发送请求与传递参数

先来一个简单的例子吧！

import requests
 
r = requests.get(url='http://www.itwhy.org')    # 最基本的GET请求
print(r.status_code)    # 获取返回状态
r = requests.get(url='http://dict.baidu.com/s', params={'wd':'python'})   #带参数的GET请求
print(r.url)
print(r.text)   #打印解码后的返回数据

很简单吧！不但GET方法简单，其他方法都是统一的接口样式哦！

requests.get(‘https://github.com/timeline.json’) #GET请求
requests.post(“http://httpbin.org/post”) #POST请求
requests.put(“http://httpbin.org/put”) #PUT请求
requests.delete(“http://httpbin.org/delete”) #DELETE请求
requests.head(“http://httpbin.org/get”) #HEAD请求
requests.options(“http://httpbin.org/get”) #OPTIONS请求

PS：以上的HTTP方法，对于WEB系统一般只支持 GET 和 POST，有一些还支持 HEAD 方法。

带参数的请求实例：

import requests
requests.get('http://www.dict.baidu.com/s', params={'wd': 'python'})    #GET参数实例
requests.post('http://www.itwhy.org/wp-comments-post.php', data={'comment': '测试POST'})    #POST参数实例

POST发送JSON数据：

import requests
import json
 
r = requests.post('https://api.github.com/some/endpoint', data=json.dumps({'some': 'data'}))
print(r.json())

定制header：

import requests

### 1、首先登陆任何页面，获取cookie
firt_page=requests.get(
url='http://dig.chouti.com/',
    headers={
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
)
firt_page_cookie_dic=firt_page.cookies.get_dict()

三、Response对象

使用requests方法后，会返回一个response对象，其存储了服务器响应的内容，如上实例中已经提到的 r.text、r.status_code……
获取文本方式的响应体实例：当你访问 r.text 之时，会使用其响应的文本编码进行解码，并且你可以修改其编码让 r.text 使用自定义的编码进行解码。

# -*- coding: utf-8 -*-
__author__ = 'ShengLeQi'

import requests
r = requests.get('http://blog.csdn.net/shanzhizi/article/details/50903748')
print(r.text, '\n{}\n'.format('*'*79), r.encoding)
# r.encoding = 'GBK'  #可以指定字符编码


print(r.text, '\n{}\n'.format('*'*79), r.encoding)

其他响应：

r.status_code 　　#响应状态码
r.raw 　　#返回原始响应体，也就是 urllib 的 response 对象，使用 r.raw.read() 读取
r.content 　　#字节方式的响应体，会自动为你解码 gzip 和 deflate 压缩
r.text 　　#字符串方式的响应体，会自动根据响应头部的字符编码进行解码
r.headers 　　#以字典对象存储服务器响应头，但是这个字典比较特殊，字典键不区分大小写，若键不存在则返回None
　　#*特殊方法*#
r.json() 　　#Requests中内置的JSON解码器
r.raise_for_status() 　　#失败请求(非200响应)抛出异常

案例之一：

# -*- coding: utf-8 -*-
__author__ = 'ShengLeQi'

import requests

### 1、首先登陆任何页面，获取cookie
firt_page=requests.get(
url='http://dig.chouti.com/',
    headers={
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
)
firt_page_cookie_dic=firt_page.cookies.get_dict()


#登录  需要POST请求
login_159 = requests.post(
    url='http://dig.chouti.com/login',
    data={
        'phone':'8612345678912',  #用户名
        'password':'woshiniba',   #密码
        'oneMonth':1,
    },
    headers={
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    },
    cookies=firt_page_cookie_dic
)

# print(login_159)
dianzan = requests.post(
    url='http://dig.chouti.com/link/vote?linksId=17184926',
    headers={   #定制headers
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    },
    cookies=firt_page_cookie_dic
)
if dianzan.status_code == 200: #状态码
    print("已经点赞！")
else:
    print("点赞失败！")

四、上传文件

import requests
 
url = 'http://127.0.0.1:5000/upload'
files = {'file': open('/home/lyb/sjzl.mpg', 'rb')}
#files = {'file': ('report.jpg', open('/home/lyb/sjzl.mpg', 'rb'))}     #显式的设置文件名
 
r = requests.post(url, files=files)
print(r.text)

更加方便的是，你可以把字符串当着文件进行上传：

import requests
 
url = 'http://127.0.0.1:5000/upload'
files = {'file': ('test.txt', b'Hello Requests.')}     #必需显式的设置文件名
 
r = requests.post(url, files=files)
print(r.text)

例子：对抽屉页面的评论：

# =====================评论====================

import requests

### 1、首先登陆任何页面，获取cookie
firt_page=requests.get(
url='http://dig.chouti.com/',
    headers={
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
)
firt_page_cookie_dic=firt_page.cookies.get_dict()


#登录  需要POST请求
login_159 = requests.post(
    url='http://dig.chouti.com/login',
    data={
        'phone':'8615968854799',
        'password':'woshiniba',
        'oneMonth':1,
    },
    headers={
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    },
    cookies=firt_page_cookie_dic
)

# print(login_159)
pinglun = requests.post(
    url='http://dig.chouti.com/comments/create',
    headers={
        'Host':'dig.chouti.com',
        'Referer':"http://dig.chouti.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    },
    data={
        "jid":"cdu_51753662715",
        "linkId":17186886,
        "content":"美剧又调侃中国了,在啧啧啧啊啊啊啊啊啊啊啊",
        "sortType":"score",
    },
    cookies=firt_page_cookie_dic
)

print("pinglun:",pinglun.text)

View Code

posted @ 2018-02-02 09:40 ShengLeQi 阅读(131) 评论(0) 收藏举报

刷新页面返回顶部

ShengLeQi

Python之爬虫相关requests

1、Python-第三方库requests详解

一、安装 Requests

二、发送请求与传递参数

三、Response对象

公告