14每周总结

14每周总结

发表时间:23.5.17

这周学习了爬虫过程中有cookie验证,以及验证码处理的问题。
import urllib.request

 

url='https://so.gushiwen.cn/user/login.aspx?from=http%3a%2f%2fso.gushiwen.cn%2fuser%2fcollect.aspx'import requests

url1="https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx"

headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36'

}

session=requests.session()

response=session.get(url=url1,headers=headers)

content=response.text# print(content)from bs4 import BeautifulSoup

soup=BeautifulSoup(content,'lxml')

value1=soup.select('#__VIEWSTATE')[0].attrs['value']

value2=soup.select('#__VIEWSTATEGENERATOR')[0].attrs['value']

codeurl='https://so.gushiwen.cn/'+soup.select('#imgCode')[0].attrs['src']# print(value1)

# print(value2)

headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',

}

data={'__VIEWSTATE':value1,'__VIEWSTATEGENERATOR':value2,'from':'http://so.gushiwen.cn/user/collect.aspx','email':"账号",'pwd':'密码','code':'','denglu':'登录',

}

code_url='https://so.gushiwen.cn/RandCode.ashx'# urllib.request.urlretrieve(code_url,'code.jpg')

response=session.get(code_url)

content=response.content

with open('code.jpg','wb') as fp:

    fp.write(content)

code=input('请输入验证码')

data['code']=code

response=session.post(url=url,data=data,headers=headers)

content=response.text

with open('古诗文网.html','w',encoding='utf-8') as fp:

    fp.write(content)

 

 

posted @ 2023-03-21 21:10  樱花开到我身边  阅读(16)  评论(0)    收藏  举报