用beautifulsou和requests爬网页出现乱码的解决方式
import bs4,requests
# requests get url, 'http://' is a must
res = requests.get('http://www.bjmb.gov.cn')
# res.encoding avoiding the disorder character of res
res.encoding = res.apparent_encoding
# throw res.text to bs4.BeautifulSoup: now, can use the bs4 inner
# function to find CSS chooser
soup = bs4.BeautifulSoup(res.text,'lxml')
# print(soup) can get all the content of the web
# chose class ri.div , elems is a list
elems = soup.select('.ri_div')
# get the member of list, elems[0], a bs4.element.Tag
# get the whole str of elems[0]
str(elems[0])
# ref https://www.zhihu.com/question/46047841
浙公网安备 33010602011771号