【1】curl

 

爬虫:

数据的抓取

library:
  requests
  urllib
  pycurl
Tools:
  curl
  wget
  httpie


curl http://www.baidu.com
参数    说明
-A    设置user-agent    curl -A "python" http://www.baidu.com
-X    用指定方法请求    curl -X POST https://www.httpbin.org/post

-I    只返回请求的头信息    curl -I https://www.httpbin.org/get

-d    以POST请求url,并发送相应的参数   curl -d test=123 http://httpbin.org/post   

-O    下载文件并以远程的文件名保存    curl -O http://httpbin.org/image/jpeg

-o    下载文件并以指定的文件名保存    curl -o filename.jpeg http://httpbin.org/image/jpeg

-L    跟随重定向请求           curl -IL https://baidu.com

-H    设置头信息            curl -o h.webp -H "accept:image/webp" http://httpbin.org/image    将图片保存为webp格式

-K    允许发起不安全的SSL请求

-b    设置cookies            curl -b a=test http://httpbin.org/cookies

-s    不显示其他无关信息

-v    显示连接过程中的所有信息

curl -A "ABC" -X POST https://www.httpbin.org/post
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Host": "www.httpbin.org", 
    "User-Agent": "ABC", 
    "X-Amzn-Trace-Id": "Root=1-5e6c3e6d-45f068a9f49e1d0818f664ea"
  }, 
  "json": null, 
  "origin": "120.84.9.4", 
  "url": "https://www.httpbin.org/post"
}

 

curl -d test=123 http://httpbin.org/post    // 传送多个参数: -d a=1 -d b=2 -d c=3 或 -d "a=1&b=2&c=3"
                            // 或者将参数
a=1&b=2&c=3写到文件中 以@filename这种方式:curl -d @/tmp/post.data http://httpbin.org/post

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "test": "123"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "8", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "curl/7.58.0", 
    "X-Amzn-Trace-Id": "Root=1-5e6d7fb1-d0fb699dad21e7492c20362a"
  }, 
  "json": null, 
  "origin": "120.84.9.4", 
  "url": "http://httpbin.org/post"
}
curl -IL https://baidu.com
HTTP/1.1 302 Moved Temporarily
Server: bfe/1.0.8.18
Date: Sun, 15 Mar 2020 01:37:35 GMT
Content-Type: text/html
Content-Length: 161
Connection: keep-alive
Location: http://www.baidu.com/

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
Connection: keep-alive
Content-Length: 277
Content-Type: text/html
Date: Sun, 15 Mar 2020 01:37:35 GMT
Etag: "575e1f60-115"
Last-Modified: Mon, 13 Jun 2016 02:50:08 GMT
Pragma: no-cache
Server: bfe/1.0.8.18

 

 curl -b a=test http://httpbin.org/cookies
{
  "cookies": {
    "a": "test"
  }
}

 



httpbin.org

数据的解析

数据存储

posted @ 2020-03-14 10:25  狂奔~  阅读(193)  评论(0)    收藏  举报