httpx的使用

httpx的使用

requests库的已经可以爬取大多数网站的数据,但是对于一些强制使用http/2.0协议访问的网站requests库是无能为力的,这时就需要使用httpx库

import requests

url = 'https://spa16.scrape.center/'
respond = requests.get(url)
print(respond.text)
--------------------------
输出结果:
http.client.RemoteDisconnected: Remote end closed connection without response
raise RemoteDisconnected("Remote end closed connection without"
······

https://spa16.scrape.center/这个网站是强制使用http/2.0协议的,使用requests库的方法进行访问出错

1.httpx的基本使用

安装方法:使用pip3来进行安装

pip3 install httpx

除此之外还需安装:

pip3 install 'httpx[http2]'

这样就既安装了httpx也安装了http/2.0的支持模块

httpx的用法与requests库的用法有很多相似之处

import httpx

url = 'https://spa16.scrape.center/'
client = httpx.Client(http2=True)
response = client.get(url)
print(response.text)

-----------------------
输出结果:
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><meta name=referrer content=no-referrer><link rel=icon href=/favicon.ico><title>Scrape | Book</title><link href=/css/chunk-50522e84.e4e1dae6.css rel=prefetch><link href=/css/chunk-f52d396c.4f574d24.css rel=prefetch><link href=/js/chunk-50522e84.6b3e24aa.js rel=prefetch><link href=/js/chunk-f52d396c.f8f41620.js rel=prefetch><link href=/css/app.ea9d802a.css rel=preload as=style><link href=/js/app.b93891e2.js rel=preload as=script><link href=/js/chunk-vendors.a02ff921.js rel=preload as=script><link href=/css/app.ea9d802a.css rel=stylesheet></head><body><noscript><strong>We're sorry but portal doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript><div id=app></div><script src=/js/chunk-vendors.a02ff921.js></script><script src=/js/app.b93891e2.js></script></body></html>
  • httpx默认是使用http/1.0协议进行访问的,如果访问使用http/2.0协议的网站,就需要手动声明一下使用http/2.0client = httpx.Client(http2=True)
  • httpx和requests有很多相似的API,因此get、post等方法在使用时二者是类似的

2.Client对象

with httpx.Client(http2=True) as client:
    response = client.get('https://spa16.scrape.center/')
    print(response)
    print(response.http_version)
  • httpx的client对象官方比较推荐的是使用with as方式
  • httpx.Client()的括号内可以添加http2、header等参数来对网站进行访问
  • 变量response的http_version属性是requests内不存在的属性,代表着http协议版本
posted @ 2021-12-14 09:37  写代码的小灰  阅读(1080)  评论(1)    收藏  举报