BeautifulSoup判断标签属性是否存在

BeautifulSoup判断标签属性是否存在

方法一

python代码

from bs4 import BeautifulSoup

with open('hote.html', encoding='utf8', mode='r') as f:
    html = f.read()
    soup = BeautifulSoup(html, 'html.parser')
    tag_a_list = soup.findAll("a")
    for item in tag_a_list:
        if 'title' in item.attrs:
            row = {'title': item['title'], 'href': item['href']}
            print(row)
        else:
            row = {'title': item.text.replace('\n', ''), 'href': item['href']}
            print(row)

打印结果

{'title': 'UltraHD 4k分辨率壁纸', 'href': '/by_resolution.php?w=3840&h=2160&lang=Chinese'}
{'title': 'UltraHD 8k分辨率壁纸', 'href': '/by_resolution.php?w=7680&h=4320&lang=Chinese'}
{'title': '热门合集', 'href': 'https://alphacoders.com/collections'}
{'title': 'iPhone 13', 'href': 'https://mobile.alphacoders.com/by-device/720/iPhone-13-Wallpapers?ref=wa'}
{'title': 'iPhone 12', 'href': 'https://mobile.alphacoders.com/by-device/634/iPhone-12-Wallpapers?ref=wa'}
{'title': '百度', 'href': 'https://www.baidu.com'}

方法二

python代码

from bs4 import BeautifulSoup

with open('hote.html', encoding='utf8', mode='r') as f:
    html = f.read()
    soup = BeautifulSoup(html, 'html.parser')
    tag_a_list = soup.findAll("a", attrs={"href": True, "title": True})
    for item in tag_a_list:
        row = {'title': item['title'], 'href': item['href']}
        print(row)
    print('\n')	
    tag_a_list = soup.findAll("a", attrs={"href": True, "title": False})
    for item in tag_a_list:
        row = {'title': item.text.replace('\n', ''), 'href': item['href']}
        print(row)

打印结果

{'title': 'UltraHD 4k分辨率壁纸', 'href': '/by_resolution.php?w=3840&h=2160&lang=Chinese'}
{'title': 'UltraHD 8k分辨率壁纸', 'href': '/by_resolution.php?w=7680&h=4320&lang=Chinese'}
{'title': '热门合集', 'href': 'https://alphacoders.com/collections'}

{'title': 'iPhone 13', 'href': 'https://mobile.alphacoders.com/by-device/720/iPhone-13-Wallpapers?ref=wa'}
{'title': 'iPhone 12', 'href': 'https://mobile.alphacoders.com/by-device/634/iPhone-12-Wallpapers?ref=wa'}
{'title': '百度', 'href': 'https://www.baidu.com'}

附上html代码

<div class="tab-pane active" id="home">
  <a href="/by_resolution.php?w=3840&amp;h=2160&amp;lang=Chinese" class="list-group-item" title="UltraHD 4k分辨率壁纸">
    <b>UltraHD 4k分辨率壁纸</b>
  </a>
  <a href="/by_resolution.php?w=7680&amp;h=4320&amp;lang=Chinese" class="list-group-item" title="UltraHD 8k分辨率壁纸">
    <b>UltraHD 8k分辨率壁纸</b>
  </a>
  <span class="list-group-item">
  </span>
  <a href="https://alphacoders.com/collections" class="list-group-item" title="热门合集">
    <b>热门合集</b>
  </a>
  <span class="list-group-item">
  </span>
  <a class="list-group-item" href="https://mobile.alphacoders.com/by-device/720/iPhone-13-Wallpapers?ref=wa">
    <b>iPhone 13</b>
  </a>
  <a class="list-group-item" href="https://mobile.alphacoders.com/by-device/634/iPhone-12-Wallpapers?ref=wa">
    <b>iPhone 12</b>
  </a>
  <a class="list-group-item" href="https://www.baidu.com">
      <span>百度</span>
  </a>
</div>
posted @ 2023-02-28 14:34  LittleDuo  阅读(1005)  评论(0)    收藏  举报