Python爬虫:处理html实体编码

来源:https://blog.csdn.net/mouday/article/details/80016731

Python处理HTML实体编码

python2

import HTMLParser  

char = r"〹"  
http_parser = HTMLParser.HTMLParser();  
uChar = http_parser.unescape(char);  

 

python3

from html import unescape

s = u'position.php?&amp;start=10#a" id="next">下一页</a>'

print(s)

print(unescape(s))

"""
position.php?&amp;start=10#a" id="next">下一页</a>
position.php?&start=10#a" id="next">下一页</a> 
"""

--------------------- 本文来自 彭世瑜 的CSDN 博客 ,全文地址请点击:https://blog.csdn.net/mouday/article/details/80016731?utm_source=copy 

posted @ 2018-09-29 13:42  猪啊美  阅读(55)  评论(0)    收藏  举报