浅谈爬虫开发中几种形式的cookie的互相转换与利用

前言

在我们写爬虫的过程中,cookie一般是我们最经常接触到的东西。而由于在爬虫过程中的各个阶段的难度往往不同,所以我们很多时候会采用浏览器、requests等等各种方案来在采集效率和开发难度中取得平衡。因此经常会遇到cookie的各种互相转换。

Cookie的转换

  • 字符串string形式转换为字典dict形式
from http.cookies import SimpleCookie
import json

cookie_str = """PHPSESSID=ufl7bh3adse15vvks0kusmgt92; ezoadgid_55920=-1; ezoref_55920=google.com"""

cookie = SimpleCookie()
cookie.load(cookie_str)

cookie_dict = {k: v.value for k, v in cookie.items()}

print(json.dumps(cookie_dict,indent=2))

结果为

{
  "PHPSESSID": "ufl7bh3adse15vvks0kusmgt92",
  "ezoadgid_55920": "-1",
  "ezoref_55920": "google.com"
}
  • selenium的name-value形式cookie转字典dict的cookie
from selenium import webdriver
import json

browser = webdriver.Chrome()

browser.get("https:///www.baidu.com")

browser_cookie = browser.get_cookies()

dict_cookie = {}

for c in browser_cookie:
    dict_cookie[c['name']] = c['value']

print(json.dumps(dict_cookie,indent=2))

结果为

{
  "ZFY": "g3:AuUuEFuAF1L0Zpt9:B3:BSIIYEoAucw7cHt4QJFly9s:C",
  "BAIDUID_BFESS": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
  "BA_HECTOR": "2pak8l2k000g2gck80000lj81htjm6j1l",
  "H_PS_PSSID": "36549_38105_38094_37907_37989_37800_37925_38086_26350_38101_38008_37881",
  "BAIDUID": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
  "BIDUPSID": "64BC7B5CD1272086BBA474D012BB5674",
  "PSTM": "1675221202",
  "BD_UPN": "12314753",
  "BD_HOME": "1"
}
  • 字典dict形式的cookie转requests cookie_jar
from requests.cookies import cookiejar_from_dict

dict_cookie = {
    "ZFY": "g3:AuUuEFuAF1L0Zpt9:B3:BSIIYEoAucw7cHt4QJFly9s:C",
    "BAIDUID_BFESS": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
    "BA_HECTOR": "2pak8l2k000g2gck80000lj81htjm6j1l",
    "H_PS_PSSID": "36549_38105_38094_37907_37989_37800_37925_38086_26350_38101_38008_37881",
    "BAIDUID": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
    "BIDUPSID": "64BC7B5CD1272086BBA474D012BB5674",
    "PSTM": "1675221202",
    "BD_UPN": "12314753",
    "BD_HOME": "1"
    }

cookie_jar  = cookiejar_from_dict(dict_cookie)
posted @ 2023-02-01 16:27  写python的叮叮叮  阅读(843)  评论(0)    收藏  举报