这是一个简单的html页面,请保持为字符串，完成后面的计算要求。

from bs4 import BeautifulSoup
import re
soup=BeautifulSoup('''<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>菜鸟教程(runoob.com)</title>
</head>
<body>
<h1>我的第一个标题</h1>
<p　id="first">我的第一个段落。</p>
</body>
<table border="1">
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>
</html>''')
print("head标签：\n",soup.head,"\n学号后两位：21")
print("body标签：\n",soup.body)
print("id为first的标签对象：\n",soup.find_all(id="first"))
st=soup.text
pp = re.findall(u'[\u1100-\uFFFDh]+?',st)
print("html页面中的中文字符")
print(pp)

posted on 2020-12-13 22:49 lannn_l 阅读(533) 评论(0) 收藏举报

刷新页面返回顶部

lannn_l

这是一个简单的html页面,请保持为字符串，完成后面的计算要求。

导航

公告