BeautifulSoup判断标签属性是否存在
BeautifulSoup判断标签属性是否存在
方法一
python代码
from bs4 import BeautifulSoup
with open('hote.html', encoding='utf8', mode='r') as f:
html = f.read()
soup = BeautifulSoup(html, 'html.parser')
tag_a_list = soup.findAll("a")
for item in tag_a_list:
if 'title' in item.attrs:
row = {'title': item['title'], 'href': item['href']}
print(row)
else:
row = {'title': item.text.replace('\n', ''), 'href': item['href']}
print(row)
打印结果
{'title': 'UltraHD 4k分辨率壁纸', 'href': '/by_resolution.php?w=3840&h=2160&lang=Chinese'}
{'title': 'UltraHD 8k分辨率壁纸', 'href': '/by_resolution.php?w=7680&h=4320&lang=Chinese'}
{'title': '热门合集', 'href': 'https://alphacoders.com/collections'}
{'title': 'iPhone 13', 'href': 'https://mobile.alphacoders.com/by-device/720/iPhone-13-Wallpapers?ref=wa'}
{'title': 'iPhone 12', 'href': 'https://mobile.alphacoders.com/by-device/634/iPhone-12-Wallpapers?ref=wa'}
{'title': '百度', 'href': 'https://www.baidu.com'}
方法二
python代码
from bs4 import BeautifulSoup
with open('hote.html', encoding='utf8', mode='r') as f:
html = f.read()
soup = BeautifulSoup(html, 'html.parser')
tag_a_list = soup.findAll("a", attrs={"href": True, "title": True})
for item in tag_a_list:
row = {'title': item['title'], 'href': item['href']}
print(row)
print('\n')
tag_a_list = soup.findAll("a", attrs={"href": True, "title": False})
for item in tag_a_list:
row = {'title': item.text.replace('\n', ''), 'href': item['href']}
print(row)
打印结果
{'title': 'UltraHD 4k分辨率壁纸', 'href': '/by_resolution.php?w=3840&h=2160&lang=Chinese'}
{'title': 'UltraHD 8k分辨率壁纸', 'href': '/by_resolution.php?w=7680&h=4320&lang=Chinese'}
{'title': '热门合集', 'href': 'https://alphacoders.com/collections'}
{'title': 'iPhone 13', 'href': 'https://mobile.alphacoders.com/by-device/720/iPhone-13-Wallpapers?ref=wa'}
{'title': 'iPhone 12', 'href': 'https://mobile.alphacoders.com/by-device/634/iPhone-12-Wallpapers?ref=wa'}
{'title': '百度', 'href': 'https://www.baidu.com'}
附上html代码
<div class="tab-pane active" id="home">
<a href="/by_resolution.php?w=3840&h=2160&lang=Chinese" class="list-group-item" title="UltraHD 4k分辨率壁纸">
<b>UltraHD 4k分辨率壁纸</b>
</a>
<a href="/by_resolution.php?w=7680&h=4320&lang=Chinese" class="list-group-item" title="UltraHD 8k分辨率壁纸">
<b>UltraHD 8k分辨率壁纸</b>
</a>
<span class="list-group-item">
</span>
<a href="https://alphacoders.com/collections" class="list-group-item" title="热门合集">
<b>热门合集</b>
</a>
<span class="list-group-item">
</span>
<a class="list-group-item" href="https://mobile.alphacoders.com/by-device/720/iPhone-13-Wallpapers?ref=wa">
<b>iPhone 13</b>
</a>
<a class="list-group-item" href="https://mobile.alphacoders.com/by-device/634/iPhone-12-Wallpapers?ref=wa">
<b>iPhone 12</b>
</a>
<a class="list-group-item" href="https://www.baidu.com">
<span>百度</span>
</a>
</div>