用beautifulsou和requests爬网页出现乱码的解决方式

公告

View Post

import bs4,requests

# requests get url, 'http://' is a must
res = requests.get('http://www.bjmb.gov.cn')

# res.encoding avoiding the disorder character of res
res.encoding = res.apparent_encoding

# throw res.text to bs4.BeautifulSoup: now, can use the bs4 inner
# function to find CSS chooser
soup = bs4.BeautifulSoup(res.text,'lxml')

# print(soup) can get all the content of the web

# chose class ri.div , elems is a list
elems = soup.select('.ri_div')

# get the member of list, elems[0], a bs4.element.Tag
# get the whole str of elems[0]
str(elems[0])

# ref https://www.zhihu.com/question/46047841

posted on 2017-02-07 15:04 身寸周佳阅读(165) 评论(0) 收藏举报

刷新页面返回顶部