入门03-bs解析常用方法

解析源码：不仅可以使用正则表达式，还可以用beautifulsoup

1) select（标签名）数据标签进行查找

2)通过类名
# class属性 对应的值进行查找:.class的属性值

3)通过id的值、id属性对应的值进行查找：#id属性的值

4)组合查找
# 不同的查找之间通过空格隔开就可以了

5)通过属性进行查找
# 语法：标签名[属性=属性值]

6)find和find_all的方法
# find("标签名", {"属性名":"属性值",})

from bs4 import BeautifulSoup as BS
html="""
<html>
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
        <p class="title" name="dromouse">
            <b>The Dormouse's story</b>
        </p>
        <p class="story">
            Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
            and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...
        </p>
    </body>
</html>
"""
bs = BS(html,"html.parser")
# 第一个参数：要处理的字符串信息（源码）
# 第二个参数：要处理的信息类型，指定是html.parser

# 1) select（标签名）数据标签进行查找
# 返回的值是列表，如果标签在源码中存在多个，则多个都会以列表元素的形式返回
# print(bs.select('title'))
# print(bs.select('a'))

# 2)通过类名
# class属性 对应的值进行查找:.class的属性值
# print(bs.select(".sister"))

# 3)通过id的值、id属性对应的值进行查找：#id属性的值
# print(bs.select("#link1"))

# 4)组合查找
# 不同的查找之间通过空格隔开就可以了
# print(bs.select("p #link1"))

# 5)通过属性进行查找
# 语法：标签名[属性=属性值]
print(bs.select('a[href="http://example.com/elsie"]'))

# 6)find和find_all的方法
# find("标签名", {"属性名":"属性值",})
print(bs.find("a", {"href":"http://example.com/elsie"}))

posted @ 2021-06-24 19:31 啊呀啊呀静阅读(204) 评论(0) 收藏举报

刷新页面返回顶部

入门03-bs解析常用方法

公告